bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[InstCombine] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off	Fangrui Song	2019-02-01	1	-0/+1
\| \| \| \|	llvm-svn: 352871
*	[InstCombine] try to reduce x86 addcarry to generic uaddo intrinsic	Sanjay Patel	2019-02-01	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we can reduce the x86-specific intrinsic to the generic op, it allows existing simplifications and value tracking folds. AFAICT, this always results in identical x86 codegen in the non-reduced case...which should be true because we semi-generically (too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must already combine/lower this intrinsic as expected. This isn't quite what was requested in: https://bugs.llvm.org/show_bug.cgi?id=40486 ...but we want to have these kinds of folds early for efficiency and to enable greater simplifications. For the case in the bug report where we have: _addcarry_u64(0, ahi, 0, &ahi) ...this gets completely simplified away in IR. Differential Revision: https://reviews.llvm.org/D57453 llvm-svn: 352870
*	[CallSite removal] Remove CallSite uses from InstCombine.	Craig Topper	2019-01-31	4	-101/+95
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57494 llvm-svn: 352771
*	[InstCombine] Missed optimization in math expression: simplify calls exp ↵	Dmitry Venikov	2019-01-31	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functions Summary: This patch enables folding following expressions under -ffast-math flag: exp(X) * exp(Y) -> exp(X + Y), exp2(X) * exp2(Y) -> exp2(X + Y). Motivation: https://bugs.llvm.org/show_bug.cgi?id=35594 Reviewers: hfinkel, spatel, efriedma, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D41342 llvm-svn: 352730
*	Add a 'dynamic' parameter to the objectsize intrinsic	Erik Pilkington	2019-01-30	2	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
*	SimplifyDemandedVectorElts for all intrinsics	Philip Reames	2019-01-30	1	-32/+15
\| \| \| \| \| \| \| \| \| \|	The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones. This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output. This is due to order of transforms w/in instcombine resulting in two slightly different fixed points. That's something we should fix, but isn't a problem w/this patch per se. Differential Revision: https://reviews.llvm.org/D57398 llvm-svn: 352653
*	[InstCombine] canonicalize cmp/select form of uadd saturate with constant	Sanjay Patel	2019-01-29	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm circling back around to a loose end from D51929. The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants. Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier. Pre: C2 == ~C1 %a = add i32 %x, C1 %c = icmp ugt i32 %x, C2 %r = select i1 %c, i32 -1, i32 %a => %a = add i32 %x, C1 %c2 = icmp ult i32 %x, C2 %r = select i1 %c2, i32 %a, i32 -1 https://rise4fun.com/Alive/pkH Differential Revision: https://reviews.llvm.org/D57352 llvm-svn: 352536
*	Demanded elements support for vector GEPs	Philip Reames	2019-01-28	1	-0/+12
\| \| \| \| \| \| \| \|	GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations Differential Revision: https://reviews.llvm.org/D57177 llvm-svn: 352440
*	[X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons	Simon Pilgrim	2019-01-20	1	-55/+0
\| \| \| \| \| \| \| \|	This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp. Noticed while cleaning up vector integer comparison costs for PR40376. llvm-svn: 351697
*	[InstCombine] Simplify cttz/ctlz + icmp ugt/ult	Nikita Popov	2019-01-19	2	-11/+70
\| \| \| \| \| \| \| \| \| \| \| \|	Followup to D55745, this time handling comparisons with ugt and ult predicates (which are the canonical forms for non-equality predicates). For ctlz we can convert into a simple icmp, for cttz we can convert into a mask check. Differential Revision: https://reviews.llvm.org/D56355 llvm-svn: 351645
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	15	-60/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[InstCombine] Don't sink dynamic allocas	Reid Kleckner	2019-01-17	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: InstCombine's sinking algorithm only thinks about memory. It doesn't think about non-memory constraints like stack object lifetime. It can sink dynamic allocas across a stacksave call, which may be used with stackrestore, which can incorrectly reduce the lifetime of the dynamic alloca. Fixes PR40365 Reviewers: hfinkel, efriedma Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56872 llvm-svn: 351475
*	[InstCombine]Avoid introduction of unaligned mem access	Serguei Katkov	2019-01-16	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair. It might result in generation of unaligned atomic load/store which later in backend will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is and handle it at backend. Reviewers: reames, anna, apilipenko, mkazantsev Reviewed By: reames Subscribers: t.p.northover, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56582 llvm-svn: 351295
*	[InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.	Florian Hahn	2019-01-15	1	-4/+3
\| \| \| \| \| \| \| \| \|	Otherwise instcombine gets stuck in a cycle. The canonicalization was added in D55961. This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400 llvm-svn: 351187
*	AMDGPU: Add a fast path for icmp.i1(src, false, NE)	Marek Olsak	2019-01-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
*	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try	David Stuttard	2019-01-14	2	-3/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
*	[InstCombine] canonicalize another raw IR rotate pattern to funnel shift	Sanjay Patel	2019-01-08	2	-3/+56
\| \| \| \| \| \| \| \| \|	This is matching the equivalent of the DAG expansion, so it should never end up with worse perf than the original code even if the target doesn't have a rotate instruction. llvm-svn: 350672
*	[InstCombine] Relax cttz/ctlz with select on zero	Nikita Popov	2019-01-05	1	-8/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cttz/ctlz intrinsics have a parameter specifying whether the result is undefined for zero. cttz(x, false) can be relaxed to cttz(x, true) if x is known non-zero, and in fact such an optimization is already performed. However, this currently doesn't work if x is non-zero as a result of a select rather than an explicit branch. This patch adds handling for this case, thus allowing x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y. Differential Revision: https://reviews.llvm.org/D55786 llvm-svn: 350463
*	[InstCombine] reduce raw IR narrowing rotate patterns to funnel shift	Sanjay Patel	2019-01-04	1	-16/+8
\| \| \| \| \| \| \| \| \|	Similar to rL350199 - there are no known analysis/codegen holes for funnel shift intrinsics now, so we can canonicalize the 6+ regular instructions to funnel shift to improve vectorization, inlining, unrolling, etc. llvm-svn: 350419
*	[InstCombine] canonicalize raw IR rotate patterns to funnel shift	Sanjay Patel	2019-01-01	1	-13/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199
*	[InstCombine] canonicalize MUL with NEG operand	Chen Zheng	2019-01-01	1	-0/+5
\| \| \| \| \| \| \| \| \|	-X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185
*	[IR] Add Instruction::isLifetimeStartOrEnd, NFC	Vedant Kumar	2018-12-21	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \|	Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an llvm.lifetime.start or an llvm.lifetime.end intrinsic. This was suggested as a cleanup in D55967. Differential Revision: https://reviews.llvm.org/D56019 llvm-svn: 349964
*	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic ↵	Simon Pilgrim	2018-12-21	1	-78/+0
\| \| \| \| \| \| \| \| \| \| \| \|	intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892
*	[InstCombine] Preserve access-group metadata.	Michael Kruse	2018-12-20	1	-1/+1
\| \| \| \| \| \| \| \| \|	Preserve llvm.access.group metadata when combining store instructions. This was forgotten in r349725. Fixes llvm.org/PR40117 llvm-svn: 349774
*	[InstCombine][AMDGPU] Handle more buffer intrinsics	Piotr Sobczak	2018-12-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Include the following intrinsics in the InsctCombine simplification: * amdgcn_raw_buffer_load * amdgcn_raw_buffer_load_format * amdgcn_struct_buffer_load * amdgcn_struct_buffer_load_format Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55882 llvm-svn: 349735
*	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.	Michael Kruse	2018-12-20	3	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725
*	[InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check	Nikita Popov	2018-12-18	1	-3/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	Checking whether a number has a certain number of trailing / leading zeros means checking whether it is of the form XXXX1000 / 0001XXXX, which can be done with an and+icmp. Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next step, this can be extended to non-equality predicates. Differential Revision: https://reviews.llvm.org/D55745 llvm-svn: 349530
*	[InstCombine] refactor isCheapToScalarize(); NFC	Sanjay Patel	2018-12-18	1	-33/+25
\| \| \| \| \| \| \| \| \|	As the FIXME indicates, this has the potential to go overboard. So I'm not sure if it's even worth keeping this vs. iteratively doing simple matches, but we might as well clean it up. llvm-svn: 349523
*	[InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)	Sanjay Patel	2018-12-17	2	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is shown specifically for a case with vector multiply here: https://bugs.llvm.org/show_bug.cgi?id=40032 ...and this might mask the original backend bug for ARM shown in: https://bugs.llvm.org/show_bug.cgi?id=39967 As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision: https://reviews.llvm.org/D55744 llvm-svn: 349389
*	[InstCombine] Fix negative GEP offset evaluation for 32-bit pointers	Nikita Popov	2018-12-12	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=39908. The evaluateGEPOffsetExpression() function simplifies GEP offsets for use in comparisons against zero, basically by converting XScale+Offset==0 to X+Offset/Scale==0 if Scale divides Offset. However, before this is done, Offset is masked down to the pointer size. This results in incorrect results for negative Offsets, because we basically end up dividing the 32-bit offset zero* extended to 64-bit bits (rather than sign extended). Fix this by explicitly sign extending the truncated value. Differential Revision: https://reviews.llvm.org/D55449 llvm-svn: 348987
*	[InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)	Sanjay Patel	2018-12-11	1	-20/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM This has the potential to create less-than-8-bit scalar types as shown in some of the test diffs, but it looks like the backend knows how to deal with that in these patterns. This is the simple part of the fix suggested in: https://bugs.llvm.org/show_bug.cgi?id=39927 Differential Revision: https://reviews.llvm.org/D55529 llvm-svn: 348862
*	InstCombine: Scalarize single use icmp/fcmp	Matt Arsenault	2018-12-10	1	-0/+12
\| \| \| \|	llvm-svn: 348801
*	[InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts	Roman Lebedev	2018-12-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was finally able to quantify what i thought was missing in the fix, it was vector constants. If we have a scalar (and %x, -1), it will be instsimplified before we reach this code, but if it is a vector, we may still have a -1 element. Thus, we want to avoid the fold if at least one element is -1. Or in other words, ignoring the undef elements, no sign bits should be set. Thus, m_NonNegative(). A follow-up for rL348181 https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348462
*	[InstCombine] remove dead code from visitExtractElement	Sanjay Patel	2018-12-05	1	-6/+0
\| \| \| \| \| \| \| \|	Extracting from a splat constant is always handled by InstSimplify. Move the test for this from InstCombine to InstSimplify to make sure that stays true. llvm-svn: 348423
*	[InstCombine] reduce duplication in visitExtractElementInst; NFC	Sanjay Patel	2018-12-05	1	-38/+32
\| \| \| \|	llvm-svn: 348418
*	[InstCombine] simplify icmps with same operands based on dominating cmp	Sanjay Patel	2018-12-05	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tests here are based on the motivating cases from D54827. More background: 1. We don't get these cases in general with SimplifyCFG because the root of the pattern match is an icmp, not a branch. I'm not sure how often we encounter this pattern vs. the seemingly more likely case with branches, but I don't see evidence to leave the minimal pattern unoptimized. 2. This has a chance of increasing compile-time because we're using a ValueTracking call to handle the match. The motivating cases could be handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/ isImpliedFalseByMatchingCmp, but I saw that we have a more comprehensive wrapper around those, so we might as well use it here unless there's evidence that it's significantly slower. 3. Ideally, we'd handle the fold to constants in InstSimplify, but as with the existing code here, we could extend this to handle cases where the result is not a constant, but a new combined predicate. That would mean splitting the logic across the 2 passes and possibly duplicating the pattern-matching cost. 4. As mentioned in D54827, this seems like the kind of thing that should be handled in Correlated Value Propagation, but that pass is currently limited to dealing with instructions with constant operands, so extending this bit of InstCombine is the smallest/easiest way to get these patterns optimized. llvm-svn: 348367
*	[CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC	Sanjay Patel	2018-12-04	1	-9/+9
\| \| \| \| \| \| \|	The old function underspecified the return type, took an unused parameter, and had a misleading name. llvm-svn: 348292
*	[InstCombine] rearrange foldICmpWithDominatingICmp; NFC	Sanjay Patel	2018-12-04	1	-33/+45
\| \| \| \| \| \| \| \|	Move it out from under the constant check, reorder predicates, add comments. This makes it easier to extend to handle the non-constant case. llvm-svn: 348284
*	[InstCombine] add helper for icmp with dominator; NFC	Sanjay Patel	2018-12-04	2	-20/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a potential small enhancement to this code that could solve the cases currently under proposal in D54827 via SimplifyCFG. Whether instcombine should be doing this kind of semi-non-local analysis in the first place is an open question, but separating the logic out can only help if/when we decide to move it to a different pass. AFAICT, any proposal to do this in SimplifyCFG could also be seen as an overreach + it would be incomplete to start the fold from a branch rather than an icmp. There's another question here about the code for processUGT_ADDCST_ADD(). That part may be completely dead after rL234638 ? llvm-svn: 348273
*	[InstCombine] fix undef propagation bug with shuffle+binop	Sanjay Patel	2018-12-03	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we have a shuffle that extends a source vector with undefs and then do some binop on that, we must make sure that the extra elements remain undef with that binop if we reverse the order of the binop and shuffle. 'or' is probably the easiest example to show the bug because 'or C, undef --> -1' (not undef). But there are other opcode/constant combinations where this is true as shown by the 'shl' test. llvm-svn: 348191
*	[InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.	Roman Lebedev	2018-12-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: https://rise4fun.com/Alive/9au https://rise4fun.com/Alive/UcQM Fixes https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348181
*	[InstCombine] rearrange shuffle+binop fold; NFC	Sanjay Patel	2018-12-03	1	-8/+9
\| \| \| \| \| \| \| \|	This code has a bug dealing with undefs, so we need to add another escape hatch, so doing some cleanup ahead of that. llvm-svn: 348175
*	[CmpInstAnalysis] fix formatting; NFC	Sanjay Patel	2018-12-03	1	-5/+5
\| \| \| \| \| \| \| \|	There are potential improvements to the structure of this API raised by D54994, but remove some cosmetic blemishes before making any functional changes. llvm-svn: 348149
*	[InstCombine] Support ssub.sat canonicalization for non-splats	Nikita Popov	2018-12-01	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision: https://reviews.llvm.org/D55011 llvm-svn: 348072
*	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"	David Stuttard	2018-11-29	2	-18/+3
\| \| \| \| \| \| \| \| \|	Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
*	[InstSimplify] fold select with implied condition	Sanjay Patel	2018-11-29	1	-18/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an almost direct move of the functionality from InstCombine to InstSimplify. There's no reason not to do this in InstSimplify because we never create a new value with this transform. (There's a question of whether any dominance-based transform belongs in either of these passes, but that's a separate issue.) I've changed 1 of the conditions for the fold (1 of the blocks for the branch must be the block we started with) into an assert because I'm not sure how that could ever be false. We need 1 extra check to make sure that the instruction itself is in a basic block because passes other than InstCombine may be using InstSimplify as an analysis on values that are not wired up yet. The 3-way compare changes show that InstCombine has some kind of phase-ordering hole. Otherwise, we would have already gotten the intended final result that we now show here. llvm-svn: 347896
*	Add support for TFE/LWE in image intrinsics	David Stuttard	2018-11-29	2	-3/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
*	[InstCombine] Combine saturating add/sub with constant operands	Nikita Popov	2018-11-28	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773
*	[InstCombine] Canonicalize ssub.sat to sadd.sat	Nikita Popov	2018-11-28	1	-0/+11
\| \| \| \| \| \| \| \| \|	Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772
*	[InstCombine] Use known overflow information for saturating add/sub	Nikita Popov	2018-11-28	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347770