summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Make combineLoadToNewType a method; NFCNikita Popov2020-01-141-0/+3
| | | | | So it can be reused as part of other combines. In particular for D71164.
* [InstCombine] Preserve nuw on sub of geps (PR44419)Nikita Popov2020-01-111-1/+2
| | | | | | | | | Fix https://bugs.llvm.org/show_bug.cgi?id=44419 by preserving the nuw on sub of geps. We only do this if the offset has a multiplication as the final operation, as we can't be sure the operations is nuw in the other cases without more thorough analysis. Differential Revision: https://reviews.llvm.org/D72048
* Revert "[InstCombine] Fold PHIs with equal incoming pointers"Daniil Suchkov2019-11-141-5/+0
| | | | | This reverts commit a2f6ae9abffcba260c22bb235879f0576bf3b783. It is reverted due to clang-cmake-armv7-selfhost buildbot failure.
* [InstCombine] Fold PHIs with equal incoming pointersDaniil Suchkov2019-11-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | This is a resubmission of bbb29738b58aaf6f6518269abdcf8f64131665a9 that was reverted due to clang tests failures. It includes the fix and additional IR tests for the missed case. Summary: In case when all incoming values of a PHI are equal pointers, this transformation inserts a definition of such a pointer right after definition of the base pointer and replaces with this value both PHI and all it's incoming pointers. Primary goal of this transformation is canonicalization of this pattern in order to enable optimizations that can't handle PHIs. Non-inbounds pointers aren't currently supported. Reviewers: spatel, RKSimon, lebedev.ri, apilipenko Reviewed By: apilipenko Tags: #llvm Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D68128
* Temporarily revert "[InstCombine] Fold PHIs with equal incoming pointers"Daniil Suchkov2019-11-131-5/+0
| | | | | | Revert due to sanitizer-windows buildbot failure. This reverts commit bbb29738b58aaf6f6518269abdcf8f64131665a9.
* [InstCombine] Fold PHIs with equal incoming pointersDaniil Suchkov2019-11-131-0/+5
| | | | | | | | | | | | | | | | | | | In case when all incoming values of a PHI are equal pointers, this transformation inserts a definition of such a pointer right after definition of the base pointer and replaces with this value both PHI and all it's incoming pointers. Primary goal of this transformation is canonicalization of this pattern in order to enable optimizations that can't handle PHIs. Non-inbounds pointers aren't currently supported. Reviewers: spatel, RKSimon, lebedev.ri, apilipenko Reviewed By: apilipenko Tags: #llvm Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D68128
* Add InstCombine/InstructionSimplify support for Freeze Instructionaqjune2019-11-121-0/+1
| | | | | | | | | | | | | | | Summary: - Add llvm::SimplifyFreezeInst - Add InstCombiner::visitFreeze - Add llvm tests Reviewers: majnemer, sanjoy, reames, lebedev.ri, spatel Reviewed By: reames, lebedev.ri Subscribers: reames, lebedev.ri, filcab, regehr, trentxintong, llvm-commits Differential Revision: https://reviews.llvm.org/D29013
* Wrong debug info generated at -O2 (-O0 is correct)Vedant Kumar2019-11-071-1/+1
| | | | | | | | | | | | | | | | Instcombiner pass was erasing trivially dead instruction without updating dependent llvm.dbg.value. which was not showing programmer current state of variables while debugging. As a part of this fix I did following, Iterate throught all the users (llvm.dbg) of a instruction which is trivially dead and set each if them undef, Before deleting the instruction. Now user will see optimized out, when try to print those variables. This fixes https://bugs.llvm.org/show_bug.cgi?id=43893 This is my first fix to llvm. Patch by kamlesh kumar! Differential Revision: https://reviews.llvm.org/D69809
* [InstCombine] Signed saturation patternsDavid Green2019-10-221-0/+1
| | | | | | | | | | | | This adds an instcombine matcher for code that attempts to perform signed saturating arithmetic by casting to a higher type. Unsigned cases are already matched, this adds extra matches for the more complex signed cases, which involves matching the min(max(add a b)) nodes with proper extends to ensure legality. Differential Revision: https://reviews.llvm.org/D68651 llvm-svn: 375505
* [InstCombine] Allow values with multiple users in SimplifyDemandedVectorEltsPiotr Sobczak2019-10-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Allow for ignoring the check for a single use in SimplifyDemandedVectorElts to be able to simplify operands if DemandedElts is known to contain the union of elements used by all users. It is a responsibility of a caller of SimplifyDemandedVectorElts to supply correct DemandedElts. Simplify a series of extractelement instructions if only a subset of elements is used. Reviewers: reames, arsenm, majnemer, nhaehnle Reviewed By: nhaehnle Subscribers: wdng, jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67345 llvm-svn: 375395
* [InstCombine] conditional sign-extend of high-bit-extract: 'or' pattern.Roman Lebedev2019-10-201-0/+2
| | | | | | | | | | | | | | In this pattern, all the "magic" bits that we'd `add` are all high sign bits, and in the value we'd be adding to they are all unset, not unexpectedly, so we can have an `or` there: https://rise4fun.com/Alive/ups It is possible that `haveNoCommonBitsSet()` should be taught about this pattern so that we never have an `add` variant, but the reasoning would need to be recursive (because of that `select`), so i'm not really sure that would be worth it just yet. llvm-svn: 375378
* [InstCombine] Shift amount reassociation in shifty sign bit test (PR43595)Roman Lebedev2019-10-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This problem consists of several parts: * Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`. This is trivial, and easy to do, we have a fold for it. * Shift amount reassociation - if we have two identical shifts, and we can simplify-add their shift amounts together, then we likely can just perform them as a single shift. But this is finicky, has one-use restrictions, and shift opcodes must be identical. But there is a super-pattern where both of these work together. to produce sign bit test from two shifts + comparison. We do indeed already handle this in most cases. But since we get that fold transitively, it has one-use restrictions. And what's worse, in this case the right-shifts aren't required to be identical, and we can't handle that transitively: If the total shift amount is bitwidth-1, only a sign bit will remain in the output value. But if we look at this from the perspective of two shifts, we can't fold - we can't possibly know what bit pattern we'd produce via two shifts, it will be *some* kind of a mask produced from original sign bit, but we just can't tell it's shape: https://rise4fun.com/Alive/cM0 https://rise4fun.com/Alive/9IN But it will *only* contain sign bit and zeros. So from the perspective of sign bit test, we're good: https://rise4fun.com/Alive/FRz https://rise4fun.com/Alive/qBU Superb! So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a sudo-analysis mode that will ignore extra-uses, and will only check whether a) those are two right shifts and b) they end up with bitwidth(x)-1 shift amount and return either the original value that we sign-checking, or null. This does not have any functionality change for the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`. All that being said, as disscussed in the review, this yet again increases usage of instsimplify in instcombine as utility. Some day that may need to be reevaluated. https://bugs.llvm.org/show_bug.cgi?id=43595 Reviewers: spatel, efriedma, vsk Reviewed By: spatel Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68930 llvm-svn: 375371
* [InstCombine] Move isSignBitCheck(), handle rest of the predicatesRoman Lebedev2019-10-071-0/+39
| | | | | | | | | True, no test coverage is being added here. But those non-canonical predicates that are already handled here already have no test coverage as far as i can tell. I tried to add tests for them, but all the patterns already get handled elsewhere. llvm-svn: 373962
* [InstCombine] Bypass high bit extract before variable sign-extension (PR43523)Roman Lebedev2019-10-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext https://rise4fun.com/Alive/7jk - the variable sext can be redundant https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver https://rise4fun.com/Alive/IF63 - without trunc we could view this as more general "drop redundant mask before right-shift", but let's handle it here for now https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant. There's more patterns for sure - e.g. we can have 'lshr' as the final shift, but that might be best handled by some more generic transform, e.g. "drop redundant masking before right-shift" (PR42456) I'm singling-out this sext patch because you can only extract high bits with `*shr` (unlike abstract bit masking), and i *know* this fold is wanted by existing code. I don't believe there is much to review here, so i'm gonna opt into post-review mode here. https://bugs.llvm.org/show_bug.cgi?id=43523 llvm-svn: 373542
* [InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0Roman Lebedev2019-09-251-1/+1
| | | | | | | | | | | | | https://rise4fun.com/Alive/KtL This also shows that the fold added in D67412 / r372257 was too specific, and the new fold allows those test cases to be handled more generically, therefore i delete now-dead code. This is yet again motivated by D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour" llvm-svn: 372912
* [InstCombine] fold sign-bit compares of sremSanjay Patel2019-09-111-0/+2
| | | | | | | | | | | | | | | | | (srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking off the sign bit and the module (low) bits: https://rise4fun.com/Alive/jSO A '2' divisor allows slightly more folding: https://rise4fun.com/Alive/tDBM Any chance to remove an 'srem' use is probably worthwhile, but this is limited to the one-use improvement case because doing more may expose other missing folds. That means it does nothing for PR21929 yet: https://bugs.llvm.org/show_bug.cgi?id=21929 Differential Revision: https://reviews.llvm.org/D67334 llvm-svn: 371610
* [InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + ↵Roman Lebedev2019-08-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | overflow bit extraction Summary: `((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` $ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll Processing /home/lebedevri/llvm-patch1.ll.. ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp ne i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %r = extractvalue {i4, i1} %n0, 1 ret %r Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp eq i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 ret i1 %r Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65144 llvm-svn: 370348
* Fix uninitialized variable warning in cppcheck. NFCI.Simon Pilgrim2019-08-281-1/+1
| | | | | | InstCombiner::MaxArraySizeForCombine is set outside the constructor so we need to ensure it has a default initialization value. llvm-svn: 370220
* [InstCombine] improve readability for icmp with cast folds; NFCSanjay Patel2019-08-201-1/+1
| | | | | | | | 1. Update function name and stale code comments. 2. Use variable names that are less ambiguous. 3. Move operand checks into the function as early exits. llvm-svn: 369390
* [InstCombine] Refactor getFlippedStrictnessPredicateAndConstant() out of ↵Roman Lebedev2019-08-141-0/+3
| | | | | | | | | canonicalizeCmpWithConstant(), NFCI I'd like to use it elsewhere, hopefully without reinventing the wheel. No functional change intended so far. llvm-svn: 368820
* [InstCombine][NFC] Rename IsFreeToInvert() -> isFreeToInvert() for consistencyRoman Lebedev2019-08-131-2/+2
| | | | | | As per https://reviews.llvm.org/D65530#inline-592325 llvm-svn: 368686
* [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if ↵Roman Lebedev2019-08-131-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | all users are freely invertible Summary: This is rather unconventional.. As the comment there says, we don't have much folds for xor-of-icmps, we try to turn them into an and-of-icmps, for which we have plenty of folds. But if the ICmp we need to invert is not single-use - we give up. As discussed in https://reviews.llvm.org/D65148#1603922, we may have a non-canonical CLAMP pattern, with bit match and select-of-threshold that we'll potentially clamp. As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`, out of all 8 variations of the pattern, only two are **not** canonicalized into the variant with and+icmp instead of bit math. The reason is because the ICmp we need to invert is not single-use - we give up. We indeed can't perform this fold at will, the general rule is that we should not increase instruction count in InstCombine, But we wouldn't end up increasing instruction count if we can adapt every other user to the inverted value. This way the `not` we create **will** get folded, and in the end the instruction count did not increase. For that, of course, we need to look at the users of a Value, which is again rather unconventional for InstCombine :S Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`. The alternatives would be to not create that `not`, but add duplicate code to manually invert all users; or to add some even less general combine to handle some more specific pattern[s]. Reviewers: spatel, nikic, RKSimon, craig.topper Reviewed By: spatel Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65530 llvm-svn: 368685
* [InstCombine] Fold "x ?% y ==/!= 0" to "x & (y-1) ==/!= 0" iff y is ↵Roman Lebedev2019-07-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | power-of-two Summary: I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling. While we did happen to handle this fold in most of the cases, the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two), or first turn `x s% -y` to `x u% y`; that does handle most of the cases. But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`, and thus we end up being stuck with `(x s% INT_MIN) == 0`. There is no such restriction for the more general fold: https://rise4fun.com/Alive/IIeS To be noted, the fold does not enforce that `y` is a constant, so it may indeed increase instruction count. This is consistent with what `x u% y`->`x & (y-1)` already does. I think it makes sense, it's at most one (simple) extra instruction, while `rem`ainder is really much more un-simple (and likely **very** costly). Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65046 llvm-svn: 367322
* [NFC][PatternMatch] Refactor code into a proper "matcher for any integral ↵Roman Lebedev2019-07-221-18/+1
| | | | | | | | | constant" Having it as a proper matcher is better for reusability elsewhere (in a follow-up patch.) llvm-svn: 366752
* [InstCombine] Refactor OptimizeOverflowCheck; NFCINikita Popov2019-05-261-1/+5
| | | | | | | | | Extract method to compute overflow based on binop and signedness, and then make the result handling code generic. This extends the always-overflow handling to signed muls, but has currently no effect, as we don't compute always overflow for them (thus NFC). llvm-svn: 361721
* [InstCombine] Remove OverflowCheckFlavor; NFCNikita Popov2019-05-261-35/+2
| | | | | | | Instead pass binary op and signedness. The extra enum only makes things more complicated in this case. llvm-svn: 361720
* Add InstCombine::visitFNeg(...)Cameron McInally2019-05-101-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D61784 llvm-svn: 360461
* [InstCombine] Be consistent w/handling of masked intrinsics style wise [NFC]Philip Reames2019-04-251-0/+2
| | | | llvm-svn: 359160
* [InstCombine] Factor out unreachable inst idiom creation [NFC]Philip Reames2019-04-171-0/+10
| | | | | | | | In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable. We can't actually insert the unreachable instruction since that would require changing the CFG. We leave that to simplifycfg later. This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places. llvm-svn: 358600
* [PGO] Profile guided code size optimization.Hiroshi Yamauchi2019-04-151-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Enable some of the existing size optimizations for cold code under PGO. A ~5% code size saving in big internal app under PGO. The way it gets BFI/PSI is discussed in the RFC thread http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html Note it doesn't currently touch loop passes. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59514 llvm-svn: 358422
* Simplify operands of masked stores and scatters based on demanded elementsPhilip Reames2019-03-201-0/+3
| | | | | | | | If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems. Differential Revision: https://reviews.llvm.org/D57247 llvm-svn: 356590
* [InstCombine] Fold add nsw + sadd.with.overflowNikita Popov2019-03-061-0/+2
| | | | | | | | | | | | | Fold `add nsw` and `sadd.with.overflow` with constants if the addition does not overflow. Part of https://bugs.llvm.org/show_bug.cgi?id=38146. Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D58881 llvm-svn: 355530
* Implementation of asm-goto support in LLVMCraig Topper2019-02-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
* [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possibleQuentin Colombet2019-02-071-0/+1
| | | | | | | | | | | | | This commit teaches InstCombine how to replace an atomicrmw operation into a simple load atomic. For a given `atomicrmw <op>`, this is possible when: 1. The ordering of that operation is compatible with a load (i.e., anything that doesn't have a release semantic). 2. <op> does not modify the value being stored Differential Revision: https://reviews.llvm.org/D57854 llvm-svn: 353471
* [InstCombine] refactor folds for (icmp (bitcast X), Y); NFCISanjay Patel2019-02-071-2/+0
| | | | llvm-svn: 353462
* [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemandedNicolai Haehnle2019-02-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The fix added in r352904 is not quite correct, or rather misleading: 1. When the texfailctrl (TFC) argument was non-constant, the fix assumed non-TFE/LWE, which is incorrect. 2. Regardless, this code path cannot even be hit for correct TFE/LWE-enabled calls, because those return a struct. Added a test case for those for completeness. Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf Reviewers: hliao, dstuttard, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57681 llvm-svn: 353097
* [CallSite removal] Remove CallSite uses from InstCombine.Craig Topper2019-01-311-5/+4
| | | | | | | | | | | | Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57494 llvm-svn: 352771
* [InstCombine] Simplify cttz/ctlz + icmp ugt/ultNikita Popov2019-01-191-1/+4
| | | | | | | | | | | | Followup to D55745, this time handling comparisons with ugt and ult predicates (which are the canonical forms for non-equality predicates). For ctlz we can convert into a simple icmp, for cttz we can convert into a mask check. Differential Revision: https://reviews.llvm.org/D56355 llvm-svn: 351645
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [InstCombine] add helper for icmp with dominator; NFCSanjay Patel2018-12-041-0/+1
| | | | | | | | | | | | | | | | | | | There's a potential small enhancement to this code that could solve the cases currently under proposal in D54827 via SimplifyCFG. Whether instcombine should be doing this kind of semi-non-local analysis in the first place is an open question, but separating the logic out can only help if/when we decide to move it to a different pass. AFAICT, any proposal to do this in SimplifyCFG could also be seen as an overreach + it would be incomplete to start the fold from a branch rather than an icmp. There's another question here about the code for processUGT_ADDCST_ADD(). That part may be completely dead after rL234638 ? llvm-svn: 348273
* Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"David Stuttard2018-11-291-2/+1
| | | | | | | | | Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
* Add support for TFE/LWE in image intrinsicsDavid Stuttard2018-11-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
* [InstCombine] fix formatting for matchBSwap(); NFCSanjay Patel2018-11-141-1/+4
| | | | | | | We should have a similar function for matching rotate and/or funnel shift, so tidy up the related existing call. llvm-svn: 346871
* [InstCombine] simplify code for merging stores; NFCISanjay Patel2018-11-101-1/+1
| | | | llvm-svn: 346596
* [InstCombine] try harder to form select from logic ops (2nd try)Sanjay Patel2018-10-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original patch was committed here: rL344609 ...and reverted: rL344612 ...because it did not properly check/test data types before calling ComputeNumSignBits(). The tests that caused bot failures for the previous commit are over-reaching front-end tests that run the entire -O optimizer pipeline: Clang :: CodeGen/builtins-systemz-zvector.c Clang :: CodeGen/builtins-systemz-zvector2.c I've added a negative test here to ensure coverage for that case. The new early exit check also tests the type of the 'B' parameter, so we don't waste time on matching if either value is unsuitable. Original commit message: This is part of solving PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 The patterns shown here are a special case of something that we already convert to select. Using ComputeNumSignBits() catches that case (but not the more complicated motivating patterns yet). The backend has hooks/logic to convert back to logic ops if that's better for the target. llvm-svn: 345149
* [InstCombine] use 'match' to simplify codeSanjay Patel2018-10-231-1/+1
| | | | | | | | | | | | There's probably some vector-with-undef-element pattern that shows an improvement, so this is probably not quite 'NFC'. This is the last step towards removing the fake binop queries for not/neg. Ie, there are no more uses of those functions in trunk. Fneg should follow. llvm-svn: 345050
* revert rL344609: [InstCombine] try harder to form select from logic opsSanjay Patel2018-10-161-3/+0
| | | | | | | | I noticed a missing check and added it at rL344610, but there actually are codegen tests that will fail without that, so I'll edit those and submit a fixed patch with more tests. llvm-svn: 344612
* [InstCombine] try harder to form select from logic opsSanjay Patel2018-10-161-0/+3
| | | | | | | | | | | | | | | This is part of solving PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 The patterns shown here are a special case of something that we already convert to select. Using ComputeNumSignBits() catches that case (but not the more complicated motivating patterns yet). The backend has hooks/logic to convert back to logic ops if that's better for the target. llvm-svn: 344609
* [InstCombine] fix complexity canonicalization with fake unary vector opsSanjay Patel2018-10-131-2/+2
| | | | | | | This is a preliminary step to avoid regressions when we add an actual 'fneg' instruction to IR. See D52934 and D53205. llvm-svn: 344458
OpenPOWER on IntegriCloud