summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[AtomicExpand] Allow libcall expansion for non-zero address spaces" ↵Mitch Phillips2019-03-061-36/+0
| | | | | | for buildbot failures. llvm-svn: 355461
* [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.Florian Hahn2019-03-061-0/+232
| | | | | | | | | | | | | | | This uses the infrastructure added in rL353152 to sink zext and sexts to sub/add users, to enable vsubl/vaddl generation when NEON is available. See https://bugs.llvm.org/show_bug.cgi?id=40025. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D58063 llvm-svn: 355460
* [AtomicExpand] Allow libcall expansion for non-zero address spacesPhilip Reames2019-03-051-0/+36
| | | | | | | | Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead. Differential Revision: https://reviews.llvm.org/D58760 llvm-svn: 355453
* [SLP] Fix invalid triple in X86 testsFlorian Hahn2019-03-052-30/+37
| | | | | | | | | | | | | | x86-64 is an invalid architecture in triples. Changing it to the correct triple (x86_64) changes some tests, because SLP is not deemed profitable any more. Reviewers: ABataev, RKSimon, spatel Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D58931 llvm-svn: 355420
* [SCEV] Ensure that isHighCostExpansion takes into account what is being dividedDavid Green2019-03-052-57/+17
| | | | | | | | | | | | | A SCEV is not low-cost just because you can divide it by a power of 2. We need to also check what we are dividing to make sure it too is not a high-code expansion. This helps to not expand the exit value of certain loops, helping not to bloat the code. The change in no-iv-rewrite.ll is reverting back to what it was testing before rL194116, and looks a lot like the other tests in replace-loop-exit-folds.ll. Differential Revision: https://reviews.llvm.org/D58435 llvm-svn: 355393
* [SCEV] Add some extra tests for IndVarSimplifys loop exit values. NFC.David Green2019-03-051-0/+232
| | | | | | | | | | | Add some tests for various loops of the form: while(S >= 32) { S -= 32; something(); }; return S; llvm-svn: 355389
* Fix invalid target triples in tests. (NFC)Florian Hahn2019-03-043-4/+4
| | | | llvm-svn: 355349
* [CodeGenPrepare] avoid crashing on non-canonical/degenerate codeSanjay Patel2019-03-041-0/+25
| | | | | | | | | | | | The test is reduced from an example in the post-commit thread for: rL354746 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html While we must avoid dying here, the real question should be: Why is non-canonical and/or degenerate code making it to CGP when using the new pass manager? llvm-svn: 355345
* [ConstantHoisting] avoid hang/crash from unreachable blocks (PR40930)Sanjay Patel2019-03-041-6/+86
| | | | | | | | | | | | | | | | | I'm not too familiar with this pass, so there might be a better solution, but this appears to fix the degenerate: PR40930 PR40931 PR40932 PR40934 ...without affecting any real-world code. As we've seen in several other passes, when we have unreachable blocks, they can contain semi-bogus IR and/or cause unexpected conditions. We would not typically expect these patterns to make it this far, but we have to guard against them anyway. llvm-svn: 355337
* [InstCombine] Add tests for add nsw + sadd.with.overflow; NFCNikita Popov2019-03-041-0/+96
| | | | | | | | Baseline tests for D58881, which fixes part of PR38146. Patch by Dan Robertson. llvm-svn: 355328
* [InstCombine] Mark debug values as unavailable after DCE.Davide Italiano2019-03-041-0/+81
| | | | | | Fixes PR40838. llvm-svn: 355301
* [InstCombine] remove stale FIXME comment from test; NFCSanjay Patel2019-03-031-1/+0
| | | | llvm-svn: 355293
* [ValueTracking] do not try to peek through bitcasts in ↵Sanjay Patel2019-03-031-0/+18
| | | | | | | | | | computeKnownBitsFromAssume() There are no tests for this case, and I'm not sure how it could ever work, so I'm just removing this option from the matcher. This should fix PR40940: https://bugs.llvm.org/show_bug.cgi?id=40940 llvm-svn: 355292
* Add extra ops in add to sub transform test in order to enforce proper ↵Amaury Sechet2019-03-031-4/+8
| | | | | | operand ordering. NFC llvm-svn: 355291
* Add test case for add to sub transformation. NFCAmaury Sechet2019-03-021-0/+27
| | | | llvm-svn: 355277
* [InstCombine] move add after smin/smaxSanjay Patel2019-03-021-36/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow-up to rL355221. This isn't specifically called for within PR14613, but we'll get there eventually if it's not already requested in some other bug report. https://rise4fun.com/Alive/5b0 Name: smax Pre: WillNotOverflowSignedSub(C1,C0) %a = add nsw i8 %x, C0 %cond = icmp sgt i8 %a, C1 %r = select i1 %cond, i8 %a, i8 C1 => %c2 = icmp sgt i8 %x, C1-C0 %u2 = select i1 %c2, i8 %x, i8 C1-C0 %r = add nsw i8 %u2, C0 Name: smin Pre: WillNotOverflowSignedSub(C1,C0) %a = add nsw i32 %x, C0 %cond = icmp slt i32 %a, C1 %r = select i1 %cond, i32 %a, i32 C1 => %c2 = icmp slt i32 %x, C1-C0 %u2 = select i1 %c2, i32 %x, i32 C1-C0 %r = add nsw i32 %u2, C0 llvm-svn: 355272
* [InstCombine] add tests for add+smin/smax; NFCSanjay Patel2019-03-021-0/+266
| | | | llvm-svn: 355271
* [Transforms] fix typo in test case. NFC.Xing GUO2019-03-021-1/+1
| | | | llvm-svn: 355265
* [SCEV] Handle case where MaxBECount is less precise than ExactBECount for OR.Florian Hahn2019-03-021-20/+0
| | | | | | | | | | | | | | | | | | | | | | | | In some cases, MaxBECount can be less precise than ExactBECount for AND and OR (the AND case was PR26207). In the OR test case, both ExactBECounts are undef, but MaxBECount are different, so we hit the assertion below. This patch uses the same solution the AND case already uses. Assertion failed: ((isa<SCEVCouldNotCompute>(ExactNotTaken) || !isa<SCEVCouldNotCompute>(MaxNotTaken)) && "Exact is not allowed to be less precise than Max"), function ExitLimit This patch also consolidates test cases for both AND and OR in a single test case. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13245 Reviewers: sanjoy, efriedma, mkazantsev Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D58853 llvm-svn: 355259
* [X86] Remove IntrArgMemOnly from target specific gather/scatter intrinsicsCraig Topper2019-03-011-1/+2
| | | | | | | | | | IntrArgMemOnly implies that only memory pointed to by pointer typed arguments will be accessed. But these intrinsics allow you to pass null to the pointer argument and put the full address into the index argument. Other passes won't be able to understand this. A colleague found that ISPC was creating gathers like this and then dead store elimination removed some stores because it didn't understand what the gather was doing since the pointer argument was null. Differential Revision: https://reviews.llvm.org/D58805 llvm-svn: 355228
* [X86] Add test case for D58805. NFCCraig Topper2019-03-011-0/+20
| | | | | | This demonstrates dead store elimination removing a store that may alias a gather that uses null as its base. llvm-svn: 355227
* [InstCombine] Extend saturating idempotent atomicrmw transform to FPPhilip Reames2019-03-011-3/+3
| | | | | | | | I'm assuming that the nan propogation logic for InstructonSimplify's handling of fadd and fsub is correct, and applying the same to atomicrmw. Differential Revision: https://reviews.llvm.org/D58836 llvm-svn: 355222
* [InstCombine] move add after umin/umaxSanjay Patel2019-03-011-33/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the motivating cases from PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 ...moving the add enables us to narrow the min/max which eliminates zext/trunc which enables signficantly better vectorization. But that bug is still not completely fixed. https://rise4fun.com/Alive/5KQ Name: umax Pre: C1 u>= C0 %a = add nuw i8 %x, C0 %cond = icmp ugt i8 %a, C1 %r = select i1 %cond, i8 %a, i8 C1 => %c2 = icmp ugt i8 %x, C1-C0 %u2 = select i1 %c2, i8 %x, i8 C1-C0 %r = add nuw i8 %u2, C0 Name: umin Pre: C1 u>= C0 %a = add nuw i32 %x, C0 %cond = icmp ult i32 %a, C1 %r = select i1 %cond, i32 %a, i32 C1 => %c2 = icmp ult i32 %x, C1-C0 %u2 = select i1 %c2, i32 %x, i32 C1-C0 %r = add nuw i32 %u2, C0 llvm-svn: 355221
* [InstCombine] add tests for umin/umax narrowing (PR14613); NFCSanjay Patel2019-03-011-0/+34
| | | | llvm-svn: 355220
* [LICM] Infer proper alignment from loads during scalar promotionPhilip Reames2019-03-012-12/+10
| | | | | | | | | | This patch fixes an issue where we would compute an unnecessarily small alignment during scalar promotion when no store is not to be guaranteed to execute, but we've proven load speculation safety. Since speculating a load requires proving the existing alignment is valid at the new location (see Loads.cpp), we can use the alignment fact from the load. For non-atomics, this is a performance problem. For atomics, this is a correctness issue, though an *incredibly* rare one to see in practice. For atomics, we might not be able to lower an improperly aligned load or store (i.e. i32 align 1). If such an instruction makes it all the way to codegen, we *may* fail to codegen the operation, or we may simply generate a slow call to a library function. The part that makes this super hard to see in practice is that the memory location actually *is* well aligned, and instcombine knows that. So, to see a failure, you have to have a) hit the bug in LICM, b) somehow hit a depth limit in InstCombine/ValueTracking to avoid fixing the alignment, and c) then have generated an instruction which fails codegen rather than simply emitting a slow libcall. All around, pretty hard to hit. Differential Revision: https://reviews.llvm.org/D58809 llvm-svn: 355217
* [Tests] More missing atomicrmw combinesPhilip Reames2019-03-011-0/+25
| | | | llvm-svn: 355215
* [Tests] Add tests for missed optimizations of saturating and idempotent FP ↵Philip Reames2019-03-011-0/+23
| | | | | | atomicrmws llvm-svn: 355212
* [InstCombine] Extend "idempotent" atomicrmw optimizations to floating pointPhilip Reames2019-03-011-0/+23
| | | | | | | | | | An idempotent atomicrmw is one that does not change memory in the process of execution. We have already added handling for the various integer operations; this patch extends the same handling to floating point operations which were recently added to IR. Note: At the moment, we canonicalize idempotent fsub to fadd when ordering requirements prevent us from using a load. As discussed in the review, I will be replacing this with canonicalizing both floating point ops to integer ops in the near future. Differential Revision: https://reviews.llvm.org/D58251 llvm-svn: 355210
* [InstCombine] add tests for add+umin/umax canonicalization; NFCSanjay Patel2019-03-011-0/+227
| | | | | | | Fixing this should solve the biggest part of the vector problems seen in: https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 355206
* [ConstantHoisting] Call cleanup() in ConstantHoistingPass::runImpl to avoid ↵Fangrui Song2019-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | dangling elements in ConstIntInfoVec for new PM Summary: ConstIntInfoVec contains elements extracted from the previous function. In new PM, releaseMemory() is not called and the dangling elements can cause segfault in findConstantInsertionPoint. Rename releaseMemory() to cleanup() to deliver the idea that it is mandatory and call cleanup() in ConstantHoistingPass::runImpl to fix this. Reviewers: ormris, zzheng, dmgreen, wmi Reviewed By: ormris, wmi Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58589 llvm-svn: 355174
* [InstCombine] fold adds of constants separated by sext/zextSanjay Patel2019-02-281-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of a transform that may be done in the backend: D13757 ...but it should always be beneficial to fold this sooner in IR for all targets. https://rise4fun.com/Alive/vaiW Name: sext add nsw %add = add nsw i8 %i, C0 %ext = sext i8 %add to i32 %r = add i32 %ext, C1 => %s = sext i8 %i to i32 %r = add i32 %s, sext(C0)+C1 Name: zext add nuw %add = add nuw i8 %i, C0 %ext = zext i8 %add to i16 %r = add i16 %ext, C1 => %s = zext i8 %i to i16 %r = add i16 %s, zext(C0)+C1 llvm-svn: 355118
* [Tests] Strengthen LICM test corpus to show alignment striping. (part 2)Philip Reames2019-02-281-0/+31
| | | | | | This should have been part of r355110, but my brain isn't quite awake yet, despite the coffee. Per the original submit comment... Doing scalar promotion w/o being able to prove the alignment of the hoisted load or sunk store is a bug. Update tests to actually show the alignment so that impact of the patch which fixes this can be seen. llvm-svn: 355111
* [Tests] Strengthen LICM test corpus to show alignment stripingPhilip Reames2019-02-282-0/+30
| | | | | | Doing scalar promotion w/o being able to prove the alignment of the hoisted load or sunk store is a bug. Update tests to actually show the alignment so that impact of the patch which fixes this can be seen. llvm-svn: 355110
* [ValueTracking] More accurate unsigned sub overflow detectionNikita Popov2019-02-288-19/+15
| | | | | | | | | | | | Second part of D58593. Compute precise overflow conditions based on all known bits, rather than just the sign bits. Unsigned a - b overflows iff a < b, and we can determine whether this always/never happens based on the minimal and maximal values achievable for a and b subject to the known bits constraint. llvm-svn: 355109
* [ValueTracking] More accurate unsigned add overflow detectionNikita Popov2019-02-284-14/+10
| | | | | | | | | | | | Part of D58593. Compute precise overflow conditions based on all known bits, rather than just the sign bits. Unsigned a + b overflows iff a > ~b, and we can determine whether this always/never happens based on the minimal and maximal values achievable for a and ~b subject to the known bits constraint. llvm-svn: 355072
* Temporarily revert "ArgumentPromotion should copy all metadata to new ↵Eric Christopher2019-02-282-54/+5
| | | | | | | | Function" and the dependent patch "Refine ArgPromotion metadata handling" as they're causing segfaults in argument promotion. This reverts commits r354032 and r353537. llvm-svn: 355060
* [InstCombine] add tests for add+ext+add; NFCSanjay Patel2019-02-271-0/+86
| | | | llvm-svn: 355020
* [InstCombine] Add additional add.sat overflow tests; NFCNikita Popov2019-02-271-0/+88
| | | | | | Baseline for D58593. llvm-svn: 354996
* [InstCombine] regenerate complete checks; NFCSanjay Patel2019-02-271-47/+119
| | | | llvm-svn: 354993
* [HotColdSplit] Disable splitting for sanitized functionsVedant Kumar2019-02-261-0/+72
| | | | | | | | | | Splitting can make sanitizer errors harder to understand, as the trapping instruction may not be in the function where the bug was detected. rdar://48142697 llvm-svn: 354931
* [InstSimplify] remove zero-shift-guard fold for general funnel shiftSanjay Patel2019-02-261-17/+25
| | | | | | | | | | | | | | | | | | | | | | As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html We can't remove the compare+select in the general case because we are treating funnel shift like a standard instruction (as opposed to a special instruction like select/phi). That means that if one of the operands of the funnel shift is poison, the result is poison regardless of whether we know that the operand is actually unused based on the instruction's particular semantics. The motivating case for this transform is the more specific rotate op (rather than funnel shift), and we are preserving the fold for that case because there is no chance of introducing extra poison when there is no anonymous extra operand to the funnel shift. llvm-svn: 354905
* [InstSimplify] add tests for rotate; NFCSanjay Patel2019-02-261-0/+100
| | | | | | | | Rotate is a special-case of funnel shift that has different poison constraints than the general case. That's not visible yet in the existing tests, but it needs to be corrected. llvm-svn: 354894
* [InstCombine] remove duplicate (but not updated) tests; NFCSanjay Patel2019-02-261-134/+0
| | | | | | | Not sure how it happened, but rL354886 was a duplicate of rL354881, but not updated with rL354887. llvm-svn: 354889
* [InstCombine] canonicalize more unsigned saturated add with 'not'Sanjay Patel2019-02-261-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yet another pattern variation suggested by: https://bugs.llvm.org/show_bug.cgi?id=14613 There are 8 more potential commuted patterns here on top of the 8 that were already handled (rL354221, rL354276, rL354393). We have the obvious commute of the 'add' + commute of the cmp predicate/operands (ugt/ult) + commute of the select operands: Name: base %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ult i32 %x, %y %r = select i1 %c, i32 -1, i32 %a => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a Name: ugt %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ugt i32 %y, %x %r = select i1 %c, i32 -1, i32 %a => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a Name: commute select %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ult i32 %y, %x %r = select i1 %c, i32 %a, i32 -1 => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a Name: ugt + commute select %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ugt i32 %x, %y %r = select i1 %c, i32 %a, i32 -1 => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a https://rise4fun.com/Alive/den llvm-svn: 354887
* [InstCombine] add more tests for saturated add; NFCSanjay Patel2019-02-261-0/+134
| | | | llvm-svn: 354886
* [InstCombine] add more tests for saturated add; NFCSanjay Patel2019-02-261-0/+134
| | | | llvm-svn: 354881
* AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsicsMatt Arsenault2019-02-252-0/+48
| | | | | | | EarlyCSE with MemorySSA was able to use this to merge multiple calls with no intervening store. llvm-svn: 354814
* [Vectorizer] Add vectorization support for fixed smul/umul intrinsicsSimon Pilgrim2019-02-251-972/+730
| | | | | | | | This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument. Differential Revision: https://reviews.llvm.org/D58616 llvm-svn: 354790
* [SLPVectorizer][X86] Add fixed smul/umul testsSimon Pilgrim2019-02-251-0/+2007
| | | | | | Baseline tests - fixed mul intrinsics aren't flagged as vectorizable yet llvm-svn: 354783
* [InstCombine] Add tests for PR40846; NFCNikita Popov2019-02-241-0/+123
| | | | | | The icmps are the same as the overflow result of the intrinsic. llvm-svn: 354760
OpenPOWER on IntegriCloud