bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[InstCombine] Relax cttz/ctlz with select on zero	Nikita Popov	2019-01-05	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cttz/ctlz intrinsics have a parameter specifying whether the result is undefined for zero. cttz(x, false) can be relaxed to cttz(x, true) if x is known non-zero, and in fact such an optimization is already performed. However, this currently doesn't work if x is non-zero as a result of a select rather than an explicit branch. This patch adds handling for this case, thus allowing x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y. Differential Revision: https://reviews.llvm.org/D55786 llvm-svn: 350463
*	[InstCombine] Add vector tests for select + ctlz/cttz; NFC	Nikita Popov	2019-01-05	1	-0/+50
\| \| \| \|	llvm-svn: 350462
*	[InstCombine] reduce raw IR narrowing rotate patterns to funnel shift	Sanjay Patel	2019-01-04	1	-77/+14
\| \| \| \| \| \| \| \| \|	Similar to rL350199 - there are no known analysis/codegen holes for funnel shift intrinsics now, so we can canonicalize the 6+ regular instructions to funnel shift to improve vectorization, inlining, unrolling, etc. llvm-svn: 350419
*	[InstCombine] canonicalize raw IR rotate patterns to funnel shift	Sanjay Patel	2019-01-01	1	-24/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199
*	[InstCombine] canonicalize MUL with NEG operand	Chen Zheng	2019-01-01	2	-18/+18
\| \| \| \| \| \| \| \| \|	-X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185
*	[InstCombine] [NFC] update testcases for canonicalize MUL with NEG operand	Chen Zheng	2018-12-29	1	-5/+20
\| \| \| \|	llvm-svn: 350154
*	[InstCombine] [NFC] testcases for canonicalize MUL with NEG operand	Chen Zheng	2018-12-20	1	-0/+44
\| \| \| \|	llvm-svn: 349847
*	[ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFC	Nikita Popov	2018-12-20	1	-19/+0
\| \| \| \| \| \| \|	Move constant folding tests into ConstantFolding/bitcount.ll and drop various tests in other places. Add coverage for undefs. llvm-svn: 349806
*	[InstCombine] Preserve access-group metadata.	Michael Kruse	2018-12-20	1	-2/+3
\| \| \| \| \| \| \| \| \|	Preserve llvm.access.group metadata when combining store instructions. This was forgotten in r349725. Fixes llvm.org/PR40117 llvm-svn: 349774
*	[InstCombine] Make x86 PADDS/PSUBS constant folding tests generic	Simon Pilgrim	2018-12-20	2	-399/+411
\| \| \| \| \| \| \| \|	As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder. PR40110 has been raised to fix the regression with constant folding vectors containing undef elements. llvm-svn: 349759
*	[InstCombine][AMDGPU] Handle more buffer intrinsics	Piotr Sobczak	2018-12-20	1	-0/+960
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Include the following intrinsics in the InsctCombine simplification: * amdgcn_raw_buffer_load * amdgcn_raw_buffer_load_format * amdgcn_struct_buffer_load * amdgcn_struct_buffer_load_format Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55882 llvm-svn: 349735
*	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.	Michael Kruse	2018-12-20	3	-13/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725
*	Regenerate test	Simon Pilgrim	2018-12-19	1	-48/+96
\| \| \| \|	llvm-svn: 349646
*	[InstCombine] add tests for extract of vector load; NFC	Sanjay Patel	2018-12-18	1	-0/+57
\| \| \| \| \| \| \| \| \| \|	There's a mismatch internally about how we are handling these patterns. We count loads as cheapToScalarize(), but then we don't actually scalarize them, so that can leave extra instructions compared to where we started when scalarizing other ops. If it's cheapToScalarize, then we should be scalarizing. llvm-svn: 349560
*	[InstCombine] auto-generate complete checks; NFC	Sanjay Patel	2018-12-18	1	-12/+25
\| \| \| \|	llvm-svn: 349548
*	[InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check	Nikita Popov	2018-12-18	2	-34/+46
\| \| \| \| \| \| \| \| \| \| \| \| \|	Checking whether a number has a certain number of trailing / leading zeros means checking whether it is of the form XXXX1000 / 0001XXXX, which can be done with an and+icmp. Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next step, this can be extended to non-equality predicates. Differential Revision: https://reviews.llvm.org/D55745 llvm-svn: 349530
*	[InstCombine] add tests for scalarization; NFC	Sanjay Patel	2018-12-18	1	-4/+59
\| \| \| \| \| \|	We miss pattern matching a splat constant if it has undef elements. llvm-svn: 349515
*	[InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)	Sanjay Patel	2018-12-17	3	-37/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is shown specifically for a case with vector multiply here: https://bugs.llvm.org/show_bug.cgi?id=40032 ...and this might mask the original backend bug for ARM shown in: https://bugs.llvm.org/show_bug.cgi?id=39967 As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision: https://reviews.llvm.org/D55744 llvm-svn: 349389
*	[InstCombine] Add cttz/ctlz + select non-bitwidth tests; NFC	Nikita Popov	2018-12-16	1	-0/+56
\| \| \| \|	llvm-svn: 349322
*	[InstCombine] Regenerate test checks; NFC	Nikita Popov	2018-12-16	1	-147/+150
\| \| \| \| \| \| \|	Also drop unnecessary entry blocks and avoid use of anonymous variables. llvm-svn: 349321
*	[InstCombine] Make cttz/ctlz knownbits tests more robust; NFC	Nikita Popov	2018-12-16	1	-20/+12
\| \| \| \| \| \| \|	Tests checking for the addition of !range metadata should be preserved if cttz/ctlz + icmp is optimized. llvm-svn: 349318
*	Revert "[InstCombine] Regenerate test checks; NFC"	Nikita Popov	2018-12-16	1	-166/+117
\| \| \| \| \| \| \| \|	This reverts commit r349311. Didn't check this carefully enough... llvm-svn: 349312
*	[InstCombine] Regenerate test checks; NFC	Nikita Popov	2018-12-16	1	-117/+166
\| \| \| \|	llvm-svn: 349311
*	[InstCombined] Add more tests for cttz/ctlz + icmp; NFC	Nikita Popov	2018-12-16	1	-11/+144
\| \| \| \| \| \|	Test cases other than icmp with the bitwidth. llvm-svn: 349310
*	[InstCombine] regenerate test checks; NFC	Sanjay Patel	2018-12-16	2	-448/+439
\| \| \| \|	llvm-svn: 349307
*	[InstCombine] add tests for vector widening transforms (PR40032); NFC	Sanjay Patel	2018-12-16	1	-0/+32
\| \| \| \|	llvm-svn: 349306
*	[InstCombine] Fix negative GEP offset evaluation for 32-bit pointers	Nikita Popov	2018-12-12	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=39908. The evaluateGEPOffsetExpression() function simplifies GEP offsets for use in comparisons against zero, basically by converting XScale+Offset==0 to X+Offset/Scale==0 if Scale divides Offset. However, before this is done, Offset is masked down to the pointer size. This results in incorrect results for negative Offsets, because we basically end up dividing the 32-bit offset zero* extended to 64-bit bits (rather than sign extended). Fix this by explicitly sign extending the truncated value. Differential Revision: https://reviews.llvm.org/D55449 llvm-svn: 348987
*	[InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)	Sanjay Patel	2018-12-11	1	-22/+63
\| \| \| \| \| \| \| \| \| \| \| \| \|	call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM This has the potential to create less-than-8-bit scalar types as shown in some of the test diffs, but it looks like the backend knows how to deal with that in these patterns. This is the simple part of the fix suggested in: https://bugs.llvm.org/show_bug.cgi?id=39927 Differential Revision: https://reviews.llvm.org/D55529 llvm-svn: 348862
*	InstCombine: Scalarize single use icmp/fcmp	Matt Arsenault	2018-12-10	1	-16/+16
\| \| \| \|	llvm-svn: 348801
*	[InstCombine] add tests for movmsk (PR39927) NFC	Sanjay Patel	2018-12-10	1	-6/+80
\| \| \| \|	llvm-svn: 348800
*	[InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts	Roman Lebedev	2018-12-06	2	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was finally able to quantify what i thought was missing in the fix, it was vector constants. If we have a scalar (and %x, -1), it will be instsimplified before we reach this code, but if it is a vector, we may still have a -1 element. Thus, we want to avoid the fold if at least one element is -1. Or in other words, ignoring the undef elements, no sign bits should be set. Thus, m_NonNegative(). A follow-up for rL348181 https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348462
*	[NFC][InstCombine] Add more miscompile tests for foldICmpWithLowBitMaskedVal()	Roman Lebedev	2018-12-06	2	-26/+74
\| \| \| \| \| \| \|	We also have to me aware of vector constants. If at least one element is -1, we can't transform. llvm-svn: 348461
*	InstCombine: Add some missing tests for scalarization	Matt Arsenault	2018-12-06	1	-0/+115
\| \| \| \|	llvm-svn: 348456
*	[InstCombine] remove dead code from visitExtractElement	Sanjay Patel	2018-12-05	1	-8/+0
\| \| \| \| \| \| \| \|	Extracting from a splat constant is always handled by InstSimplify. Move the test for this from InstCombine to InstSimplify to make sure that stays true. llvm-svn: 348423
*	[InstCombine] add/move tests for extractelement; NFC	Sanjay Patel	2018-12-05	2	-14/+34
\| \| \| \|	llvm-svn: 348417
*	[InstCombine] simplify icmps with same operands based on dominating cmp	Sanjay Patel	2018-12-05	1	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tests here are based on the motivating cases from D54827. More background: 1. We don't get these cases in general with SimplifyCFG because the root of the pattern match is an icmp, not a branch. I'm not sure how often we encounter this pattern vs. the seemingly more likely case with branches, but I don't see evidence to leave the minimal pattern unoptimized. 2. This has a chance of increasing compile-time because we're using a ValueTracking call to handle the match. The motivating cases could be handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/ isImpliedFalseByMatchingCmp, but I saw that we have a more comprehensive wrapper around those, so we might as well use it here unless there's evidence that it's significantly slower. 3. Ideally, we'd handle the fold to constants in InstSimplify, but as with the existing code here, we could extend this to handle cases where the result is not a constant, but a new combined predicate. That would mean splitting the logic across the 2 passes and possibly duplicating the pattern-matching cost. 4. As mentioned in D54827, this seems like the kind of thing that should be handled in Correlated Value Propagation, but that pass is currently limited to dealing with instructions with constant operands, so extending this bit of InstCombine is the smallest/easiest way to get these patterns optimized. llvm-svn: 348367
*	[InstCombine] add tests for implied simplifications; NFC	Sanjay Patel	2018-12-04	1	-0/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ideally, we would fold all of these in InstSimplify in a similar way to rL347896, but this is a bit awkward when we're trying to simplify a compare directly because the ValueTracking API expects the compare as an input, but in InstSimplify, we just have the operands of the compare. Given that we can do transforms besides just simplifications, we might as well just extend the code in InstCombine (which already does simplifications with constant operands). llvm-svn: 348312
*	[InstCombine] auto-generate full checks for icmp overflow tests; NFC	Sanjay Patel	2018-12-04	1	-23/+74
\| \| \| \|	llvm-svn: 348274
*	[InstCombine] auto-generate full checks for icmp dominator tests; NFC	Sanjay Patel	2018-12-04	2	-119/+162
\| \| \| \|	llvm-svn: 348270
*	[InstCombine] fix undef propagation bug with shuffle+binop	Sanjay Patel	2018-12-03	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we have a shuffle that extends a source vector with undefs and then do some binop on that, we must make sure that the extra elements remain undef with that binop if we reverse the order of the binop and shuffle. 'or' is probably the easiest example to show the bug because 'or C, undef --> -1' (not undef). But there are other opcode/constant combinations where this is true as shown by the 'shl' test. llvm-svn: 348191
*	[InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.	Roman Lebedev	2018-12-03	2	-8/+12
\| \| \| \| \| \| \| \| \| \| \|	These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: https://rise4fun.com/Alive/9au https://rise4fun.com/Alive/UcQM Fixes https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348181
*	[InstCombine] add tests for shuffle+binop fold; NFC	Sanjay Patel	2018-12-03	1	-2/+58
\| \| \| \|	llvm-svn: 348173
*	[InstCombine] Support ssub.sat canonicalization for non-splats	Nikita Popov	2018-12-01	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision: https://reviews.llvm.org/D55011 llvm-svn: 348072
*	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"	David Stuttard	2018-11-29	1	-23/+0
\| \| \| \| \| \| \| \| \|	Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
*	[InstSimplify] fold select with implied condition	Sanjay Patel	2018-11-29	2	-288/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an almost direct move of the functionality from InstCombine to InstSimplify. There's no reason not to do this in InstSimplify because we never create a new value with this transform. (There's a question of whether any dominance-based transform belongs in either of these passes, but that's a separate issue.) I've changed 1 of the conditions for the fold (1 of the blocks for the branch must be the block we started with) into an assert because I'm not sure how that could ever be false. We need 1 extra check to make sure that the instruction itself is in a basic block because passes other than InstCombine may be using InstSimplify as an analysis on values that are not wired up yet. The 3-way compare changes show that InstCombine has some kind of phase-ordering hole. Otherwise, we would have already gotten the intended final result that we now show here. llvm-svn: 347896
*	[InstCombine] auto-generate complete checks; NFC	Sanjay Patel	2018-11-29	1	-42/+116
\| \| \| \|	llvm-svn: 347881
*	Add support for TFE/LWE in image intrinsics	David Stuttard	2018-11-29	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
*	[InstCombine] Combine saturating add/sub with constant operands	Nikita Popov	2018-11-28	1	-52/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773
*	[InstCombine] Canonicalize ssub.sat to sadd.sat	Nikita Popov	2018-11-28	1	-28/+28
\| \| \| \| \| \| \| \| \|	Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772
*	[ValueTracking] Determine always-overflow condition for unsigned sub	Nikita Popov	2018-11-28	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Always-overflow was already determined for unsigned addition, but not subtraction. This patch establishes parity. This allows us to perform some additional simplifications for signed saturating subtractions. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347771