summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink ↵David L. Jones2019-04-041-44/+0
| | | | | | | | | | function calls without used results (PR41259)' This revision causes tests to fail under ASAN. Since the cause of the failures is not clear (could be ASAN, could be a Clang bug, could be a bug in this revision), the safest course of action seems to be to revert while investigating. llvm-svn: 357667
* [ProfileSummary] Count callsite samples when computing total samples.Taewook Oh2019-04-034-2/+18
| | | | | | | | | | | | | | Summary: Currently ProfileSummaryBuilder doesn't count into callsite samples when computing total samples. Considering that ProfileSummaryInfo is used to checked the hotness of not only body samples but also callsite samples (from SampleProfileLoader), I think the callsite sample counts should be considered when computing total samples. Reviewers: eraman, danielcdh, wmi Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59835 llvm-svn: 357627
* [InstCombine] Simplify ctpop with bitreverse/bswapDavid Bolvansky2019-04-031-8/+4
| | | | | | | | | | | | | | | | Summary: Fixes PR41337 Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60148 llvm-svn: 357564
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-031-0/+70
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* InstSimplify: Fold round intrinsics from sitofp/uitofpMatt Arsenault2019-04-031-24/+12
| | | | | | https://godbolt.org/z/gEMRZb llvm-svn: 357549
* [InstCombine] Added tests for PR41337David Bolvansky2019-04-021-0/+53
| | | | llvm-svn: 357522
* [InstCombine] Simplify ctlz/cttz with bitreverseDavid Bolvansky2019-04-021-18/+12
| | | | | | | | | | | | | | | | Summary: Fixes PR41273 Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60096 llvm-svn: 357521
* [InstCombine] Added tests for PR41273David Bolvansky2019-04-021-0/+75
| | | | llvm-svn: 357508
* [ArgPromotion] Set debug location at updated callsitesVedant Kumar2019-04-021-3/+23
| | | | | | | | | | | | Set the correct debug location on instructions which load arguments in preparation for a call to an arg-promoted function. This prevents location cascade from misattributing the line/scope of one of these loads to the location of the instruction preceding the call. Differential Revision: https://reviews.llvm.org/D60113 llvm-svn: 357500
* [DebugInfo] Fix pr41180 : Loop Vectorization Debugify FailureVedant Kumar2019-04-021-0/+110
| | | | | | | | | | | | | | | | | | | | | Bug: https://bugs.llvm.org/show_bug.cgi?id=41180 In the bug test case the debug location was missing for the cmp instruction in the "middle block" BB. This patch fixes the bug by copying the debug location from the cmp of the scalar loop's terminator branch, if it exists. The patch also fixes the debug location on the subsequent branch instruction. It was previously using the location of the of the original loop's pre-header block terminator. Both of these instructions will now map to the source line of the conditional branch in the original loop. A regression test has been added that covers these issues. Patch by Orlando Cazalet-Hyams! Differential Revision: https://reviews.llvm.org/D59944 llvm-svn: 357499
* [WideableCond] Fix a nasty bug in detection of "explicit guards"Philip Reames2019-04-021-0/+59
| | | | | | | | The code was failing to actually check for the presence of the call to widenable_condition. The whole point of specifying the widenable_condition intrinsic was allowing widening transforms. A normal branch is not widenable. A normal branch leading to a deopt is not widenable (in general). I added a test case via LoopPredication, but GuardWidening has an analogous bug. Those are the only two passes actually using this utility just yet. Noticed while working on LoopPredication for non-widenable branches; POC in D60111. llvm-svn: 357493
* [SimplifyCFG] Don't split musttail call from retJoseph Tremoulet2019-04-021-2/+20
| | | | | | | | | | | | | | | | | | | Summary: When inserting an `unreachable` after a noreturn call, we must ensure that it's not a musttail call to avoid breaking the IR invariants for musttail calls. Reviewers: fedor.sergeev, majnemer Reviewed By: majnemer Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60080 llvm-svn: 357485
* [SampleProfile] Repeat indirect call promotion only when the target is ↵Taewook Oh2019-04-022-0/+37
| | | | | | | | | | | | | | | | actually hot. Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets. Reviewers: danielcdh, wmi Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59940 llvm-svn: 357484
* [PruneEH] Don't split musttail call from retJoseph Tremoulet2019-04-021-0/+15
| | | | | | | | | | | | | | | | | | | Summary: When inserting an `unreachable` after a noreturn call, we must ensure that it's not a musttail call to avoid breaking the IR invariants for musttail calls. Reviewers: fedor.sergeev, majnemer Reviewed By: majnemer Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60079 llvm-svn: 357483
* SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without ↵Hans Wennborg2019-04-021-0/+44
| | | | | | | | | | | | | | | | used results (PR41259) The code was previously checking that candidates for sinking had exactly one use or were a store instruction (which can't have uses). This meant we could sink call instructions only if they had a use. That limitation seemed a bit arbitrary, so this patch changes it to "instruction has zero or one use" which seems more natural and removes the need to special-case stores. Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 357452
* InstSimplify: Add missing case from r357386Matt Arsenault2019-04-021-0/+23
| | | | llvm-svn: 357443
* AMDGPU: Fix test filenameMatt Arsenault2019-04-021-0/+0
| | | | llvm-svn: 357441
* [LoopPred] Be uniform about proving generated conditionsPhilip Reames2019-04-011-0/+49
| | | | | | We'd been optimizing the case where the predicate was obviously true, do the same for the false case. Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard. Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful. llvm-svn: 357408
* [LoopPred] Delete the old condition expressions if unusedPhilip Reames2019-04-018-202/+2
| | | | | | LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around. This would get cleaned up by other passes of course, but we might as well do it eagerly. That also makes the test output less confusing. llvm-svn: 357406
* [Tests] Autogen all the LoopPredication testsPhilip Reames2019-04-017-388/+1912
| | | | | | I'm about to make some changes to the pass which cause widespread - but uninteresting - test diffs. Prepare the tests for easy updating. llvm-svn: 357404
* [Tests] Add tests for a possible loop predication transform variantPhilip Reames2019-04-011-0/+247
| | | | | | As highlighted by tests, if one of the operands is loop variant, but guaranteed to have the same value on all iterations, we have a missed oppurtunity. llvm-svn: 357403
* [InstCombine] Handle vector gep with scalar argument in ↵Mikael Holmen2019-04-011-0/+16
| | | | | | | | | | | | | | | | | | | | | | | evaluateInDifferentElementOrder Summary: This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. Reviewers: reames, spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60058 llvm-svn: 357389
* Revert "[InstCombine] Handle vector gep with scalar argument in ↵Mikael Holmen2019-04-011-16/+0
| | | | | | | | | | | evaluateInDifferentElementOrder" This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5. I'll recommit with a better commit message with reference to the phabricator review. llvm-svn: 357387
* InstSimplify: Add baseline test for upcoming changeMatt Arsenault2019-04-011-0/+120
| | | | llvm-svn: 357386
* [InstCombine] Handle vector gep with scalar argument in ↵Mikael Holmen2019-04-011-0/+16
| | | | | | | | | | | | evaluateInDifferentElementOrder This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. llvm-svn: 357385
* [InstCombine] eliminate commuted select-shuffles + binop (PR41304)Sanjay Patel2019-04-011-20/+51
| | | | | | | | | | | | | | | | | | | | | | | If we have a commutable vector binop with inverted select-shuffles, we don't care about the order of the operands in each vector lane: LHS = shuffle V1, V2, <0, 5, 6, 3> RHS = shuffle V2, V1, <0, 5, 6, 3> LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2 PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...is currently titled as an SLP enhancement, but at least for the given example, we can reduce that in instcombine because we are just eliminating shuffles. As noted in the TODO, this could be generalized, but I haven't thought through those patterns completely, so this is limited to what appears to be always safe. Differential Revision: https://reviews.llvm.org/D60048 llvm-svn: 357382
* [InstCombine] add tests for inverted select-shuffles + binop (PR41304); NFCSanjay Patel2019-03-311-0/+244
| | | | llvm-svn: 357368
* [InstCombine] canonicalize select shuffles by commutingSanjay Patel2019-03-319-30/+30
| | | | | | | | | | | | | | | | | | | | In PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...we have a case where we want to fold a binop of select-shuffle (blended) values. Rather than try to match commuted variants of the pattern, we can canonicalize the shuffles and check for mask equality with commuted operands. We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a special case that the backend is required to handle because we already canonicalize vector select to this shuffle form. So there should be no codegen difference from this change. It's possible that this improves CSE in IR though. Differential Revision: https://reviews.llvm.org/D60016 llvm-svn: 357366
* [NFC][InstCombine] Add tests for combining icmp of no-wrap sub w/ constant.Luqman Aden2019-03-311-0/+90
| | | | llvm-svn: 357360
* AMDGPU: Remove dx10-clamp from subtarget featuresMatt Arsenault2019-03-292-0/+197
| | | | | | | | | | | | | | | | | | Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
* [InstCombine] autogenerate complete checks; NFCSanjay Patel2019-03-291-31/+50
| | | | llvm-svn: 357291
* [InstCombine] regenerate test checks; NFCSanjay Patel2019-03-292-50/+36
| | | | llvm-svn: 357288
* [SLP] Add support for commutative icmp/fcmp predicatesSimon Pilgrim2019-03-291-80/+16
| | | | | | | | | | For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands. This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?). Differential Revision: https://reviews.llvm.org/D59992 llvm-svn: 357266
* [SLP] Add support for swapping icmp/fcmp predicates to permit vectorizationSimon Pilgrim2019-03-291-60/+12
| | | | | | | | We should be able to match elements with the swapped predicate as well - as long as we commute the source operands. Differential Revision: https://reviews.llvm.org/D59956 llvm-svn: 357243
* [LSR] Fix signed overflow in GenerateCrossUseConstantOffsets.Florian Hahn2019-03-281-0/+39
| | | | | | | | | | | | | | | For the attached test case, unchecked addition of immediate starts and ends overflows, as they can be arbitrary i64 constants. Proof: https://rise4fun.com/Alive/Plqc Reviewers: qcolombet, gilr, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59218 llvm-svn: 357217
* [InterleavedAccessPass] Don't increase the number of bytes loaded.Eli Friedman2019-03-281-6/+19
| | | | | | | | | | | | | | | | | Even if the interleaving transform would otherwise be legal, we shouldn't introduce an interleaved load that is wider than the original load: it might have undefined behavior. It might be possible to perform some sort of mask-narrowing transform in some cases (using a narrower interleaved load, then extending the results using shufflevectors). But I haven't tried to implement that, at least for now. Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 . Differential Revision: https://reviews.llvm.org/D59954 llvm-svn: 357212
* [SLP][X86] Add tests showing failure to commute icmp/fcmp by swapping predicateSimon Pilgrim2019-03-281-0/+196
| | | | | | By swapping icmp/fcmp predicates we can commute their operands to improve vectorization llvm-svn: 357204
* [SLP][X86] Add tests showing failure to commute icmp/fcmp operandsSimon Pilgrim2019-03-281-0/+199
| | | | | | Some predicates are fully commutative - we should be able to easily commute their operands to improve vectorization llvm-svn: 357202
* [X86MacroFusion] Handle branch fusion (AMD CPUs).Clement Courbet2019-03-281-2/+3
| | | | | | | | | | | | | | | | | | Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171
* [VPlan] Determine Vector Width programmatically.Florian Hahn2019-03-283-7/+202
| | | | | | | | | | | | | | With this change, the VPlan native path is triggered with the directive: #pragma clang loop vectorize(enable) There is no need to specify the vectorize_width(N) clause. Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D57598 llvm-svn: 357156
* [NewPM] Fix a nasty bug with analysis invalidation in the new PM.Chandler Carruth2019-03-282-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue here is that we actually allow CGSCC passes to mutate IR (and therefore invalidate analyses) outside of the current SCC. At a minimum, we need to support mutating parent and ancestor SCCs to support the ArgumentPromotion pass which rewrites all calls to a function. However, the analysis invalidation infrastructure is heavily based around not needing to invalidate the same IR-unit at multiple levels. With Loop passes for example, they don't invalidate other Loops. So we need to customize how we handle CGSCC invalidation. Doing this without gratuitously re-running analyses is even harder. I've avoided most of these by using an out-of-band preserved set to accumulate the cross-SCC invalidation, but it still isn't perfect in the case of re-visiting the same SCC repeatedly *but* it coming off the worklist. Unclear how important this use case really is, but I wanted to call it out. Another wrinkle is that in order for this to successfully propagate to function analyses, we have to make sure we have a proxy from the SCC to the Function level. That requires pre-creating the necessary proxy. The motivating test case now works cleanly and is added for ArgumentPromotion. Thanks for the review from Philip and Wei! Differential Revision: https://reviews.llvm.org/D59869 llvm-svn: 357137
* [InstCombine] Use uadd.sat and usub.sat for canonicalizationNikita Popov2019-03-272-126/+72
| | | | | | | | | | | | | Start using the uadd.sat and usub.sat intrinsics for the existing canonicalizations. These intrinsics should optimize better than expanded IR, have better handling in the X86 backend and should be no worse than expanded IR in other backends, as far as we know. rL357012 already introduced use of uadd.sat for the add+umin pattern. Differential Revision: https://reviews.llvm.org/D58872 llvm-svn: 357103
* [X86MacroFusion][NFC] Add a bulldozer test.Clement Courbet2019-03-271-1/+23
| | | | llvm-svn: 357099
* [InstCombine] Add tests for ssubo X, C -> saddo X, -C; NFCNikita Popov2019-03-262-0/+177
| | | | | | | | | | | Add baseline tests for canonicalization of ssubo X, C -> saddo X, -C. Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D59653 llvm-svn: 357013
* [InstCombine] form uaddsat from add+umin (PR14613)Sanjay Patel2019-03-262-22/+69
| | | | | | | | | | | | | | | | | | This is the last step towards solving the examples shown in: https://bugs.llvm.org/show_bug.cgi?id=14613 With this change, x86 should end up with psubus instructions when those are available. All known codegen issues with expanding the saturating intrinsics were resolved with: D59006 / rL356855 We also have some early evidence in D58872 that using the intrinsics will lead to better perf. If some target regresses from this, custom lowering of the intrinsics (as in the above for x86) may be needed. llvm-svn: 357012
* [InstCombine] add tests for uaddsat using min; NFCSanjay Patel2019-03-261-0/+86
| | | | llvm-svn: 357005
* [InstCombine] update tests to use FileCheck; NFCSanjay Patel2019-03-262-45/+83
| | | | llvm-svn: 357004
* [SLPVectorizer] Merge reorderAltShuffleOperands into ↵Simon Pilgrim2019-03-251-2/+2
| | | | | | | | | | reorderInputsAccordingToOpcode As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings. Differential Revision: https://reviews.llvm.org/D59784 llvm-svn: 356939
* [SLPVectorizer] Update file missed in rL356913Simon Pilgrim2019-03-251-5/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356915
* [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction ↵Simon Pilgrim2019-03-2544-160/+160
| | | | | | | | | | | | | | canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356913
OpenPOWER on IntegriCloud