summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/CodeGenPrepare
Commit message (Collapse)AuthorAgeFilesLines
* [CodeGen][Tests] Fix b3cf70427eb1e97d9b89ba6e9298c280c8a32c74Clement Courbet2020-02-191-0/+2
| | | | Add missing lit.local.cfg in test/Transforms/CodeGenPrepare/PowerPC
* [CodeGen] Fix the computation of the alignment of split stores.Hans Wennborg2020-02-122-0/+185
| | | | | | By Clement Courbet! Backported from rG15488ff24b4a
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-243-3/+3
| | | | as cleanups after D56351
* [PGO][PGSO] Enable size optimizations in code gen / target passes for cold code.Hiroshi Yamauchi2019-12-131-0/+41
| | | | | | | | | | | | Summary: Split off of D67120. Reviewers: davidxl Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71288
* Reapply r374743 with a fix for the ocaml bindingJoerg Sonnenberger2019-10-144-245/+0
| | | | | | | | | | | | | | | | | | | Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374784
* Revert "Add a pass to lower is.constant and objectsize intrinsics"Dmitri Gribenko2019-10-144-0/+245
| | | | | | | This reverts commit r374743. It broke the build with Ocaml enabled: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218 llvm-svn: 374768
* Add a pass to lower is.constant and objectsize intrinsicsJoerg Sonnenberger2019-10-134-245/+0
| | | | | | | | | | | | | | | | | This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374743
* [NFC} Updated testDavid Bolvansky2019-09-171-2/+5
| | | | llvm-svn: 372093
* [CGP] Ensure sinking multiple instructions does not invalidate dominance checksDavid Green2019-09-122-2/+112
| | | | | | | | | | | | | | | | | | In MVE, as of rL371218, we are attempting to sink chains of instructions such as: %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0 %broadcast.splat26 = shufflevector <8 x i8> %l1, <8 x i8> undef, <8 x i32> zeroinitializer In certain situations though, we can end up breaking the dominance relations of instructions. This happens when we sink the instruction into a loop, but cannot remove the originals. The Use is updated, which might in fact be a Use from the second instruction to the first. This attempts to fix that by reversing the order of instruction that are sunk, and ensuring that we update the uses on new instructions if they have already been sunk, not the old ones. Differential Revision: https://reviews.llvm.org/D67366 llvm-svn: 371743
* [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selectionSam Tebbs2019-09-061-0/+216
| | | | | | | | | | | | This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218
* [CodeGenPrepare] Fix use-after-freeSanjay Patel2019-08-161-0/+17
| | | | | | | | | | | | | | | | | | | If OptimizeExtractBits() encountered a shift instruction with no operands at all, it would erase the instruction, but still return false. This previously didn’t matter because its caller would always return after processing the instruction, but https://reviews.llvm.org/D63233 changed the function’s caller to fall through if it returned false, which would then cause a use-after-free detectable by ASAN. This change makes OptimizeExtractBits return true if it removes a shift instruction with no users, terminating processing of the instruction. Patch by: @brentdax (Brent Royal-Gordon) Differential Revision: https://reviews.llvm.org/D66330 llvm-svn: 369168
* [CodeGenPrepare] fix RUN line settingsSanjay Patel2019-08-161-1/+1
| | | | | | I'm not sure if this was running as expected with a broken triple. llvm-svn: 369156
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-174-4/+0
| | | | llvm-svn: 363538
* [CodeGenPrepare][x86] shift both sides of a vector select when profitableSanjay Patel2019-06-161-18/+31
| | | | | | | | | | | | | | | | | This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511
* [CodeGenPrepare] propagate debuginfo when copying a shuffleSanjay Patel2019-06-141-14/+18
| | | | llvm-svn: 363409
* [x86] add tests for vector shifts; NFCSanjay Patel2019-06-121-0/+117
| | | | llvm-svn: 363203
* [CodeGenPrepare] limit overflow intrinsic matching to a single basic block ↵Sanjay Patel2019-05-042-15/+34
| | | | | | | | | | | | | | | | | | | | | (2nd try) This is a subset of the original commit from rL359879 which was reverted because it could crash when using the 'RemovedInstructions' structure that enables delayed deletion of dead instructions. The motivating compile-time win does not require that change though. We should get most of that win from this change alone. Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359969
* Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic ↵Evgeniy Stepanov2019-05-032-14/+15
| | | | | | | | block" This reverts commit r359879, which introduced a compiler crash. llvm-svn: 359908
* [CodeGenPrepare] limit overflow intrinsic matching to a single basic blockSanjay Patel2019-05-032-15/+14
| | | | | | | | | | | | | | | | | | | Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. Also, we were restarting the iterator loops when doing the overflow intrinsic transforms by marking the dominator tree for update. That was done to prevent iterating over a removed instruction. But we can postpone the deletion using the existing "RemovedInsts" structure, and that means we don't need to update the DT. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359879
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-1763-0/+6813
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-1763-6813/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [CodeGenPrepare] limit formation of overflow intrinsics (PR41129)Sanjay Patel2019-03-211-7/+6
| | | | | | | | | | | | | | | | | | This is probably a bigger limitation than necessary, but since we don't have any evidence yet that this transform led to real-world perf improvements rather than regressions, I'm making a quick, blunt fix. In the motivating x86 example from: https://bugs.llvm.org/show_bug.cgi?id=41129 ...and shown in the regression test, we want to avoid an extra instruction in the dominating block because that could be costly. The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version is any better than the other yet. Differential Revision: https://reviews.llvm.org/D59602 llvm-svn: 356665
* [CGP][x86] add tests for usubo regression (PR41129); NFCSanjay Patel2019-03-201-0/+39
| | | | llvm-svn: 356559
* [CGP] add another bailout for degenerate code (PR41064)Sanjay Patel2019-03-141-0/+21
| | | | | | | | | | | | This is almost the same as: rL355345 ...and should prevent any potential crashing from examples like: https://bugs.llvm.org/show_bug.cgi?id=41064 ...although the bug was masked by: rL355823 ...and I'm not sure how to repro the problem after that change. llvm-svn: 356218
* CodeGenPrep: preserve inbounds attribute when sinking GEPs.Tim Northover2019-03-125-23/+23
| | | | | | | | | Targets can potentially emit more efficient code if they know address computations never overflow. For example ILP32 code on AArch64 (which only has 64-bit address computation) can ignore the possibility of overflow with this extra information. llvm-svn: 355926
* [CGP] Limit distance between overflow math and cmpSam Parker2019-03-111-0/+56
| | | | | | | | | | | | | | | Inserting an overflowing arithmetic intrinsic can increase register pressure by producing two values at a point where only one is needed, while the second use maybe several blocks away. This increase in pressure is likely to be more detrimental on performance than rematerialising one of the original instructions. So, check that the arithmetic and compare instructions are no further apart than their immediate successor/predecessor. Differential Revision: https://reviews.llvm.org/D59024 llvm-svn: 355823
* [CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInstRong Xu2019-03-081-0/+34
| | | | | | | | | | | | | | | r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be maintained correctly in optimizations in optimizeBlock() and optimizeInst(). Function optimizeSelectInst() does not update the flag. This patch propagates the flag in optimizeSelectInst() back to optimizeBlock(). This patch also removes ModifiedDT in CodeGenPrepare class (which is not used). The property of ModifiedDT is now recorded in a ref parameter. Differential Revision: https://reviews.llvm.org/D59139 llvm-svn: 355751
* [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.Florian Hahn2019-03-061-0/+232
| | | | | | | | | | | | | | | This uses the infrastructure added in rL353152 to sink zext and sexts to sub/add users, to enable vsubl/vaddl generation when NEON is available. See https://bugs.llvm.org/show_bug.cgi?id=40025. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D58063 llvm-svn: 355460
* [CodeGenPrepare] avoid crashing on non-canonical/degenerate codeSanjay Patel2019-03-041-0/+25
| | | | | | | | | | | | The test is reduced from an example in the post-commit thread for: rL354746 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html While we must avoid dying here, the real question should be: Why is non-canonical and/or degenerate code making it to CGP when using the new pass manager? llvm-svn: 355345
* [CGP] add special-cases to form unsigned add with overflow (PR40486)Sanjay Patel2019-02-241-16/+20
| | | | | | | | | | | | | | | | | | | | There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746
* [CGP] add tests for uaddo increment/decrement; NFCSanjay Patel2019-02-221-0/+60
| | | | llvm-svn: 354699
* [CGP] match a special-case of unsigned subtract overflowSanjay Patel2019-02-201-4/+5
| | | | | | | This is the 'sub0' (negate) pattern from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354519
* [CGP][x86] add tests for usubo special-case; NFCSanjay Patel2019-02-201-0/+15
| | | | | | | This is another example from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354475
* [CGP] form usub with overflow from sub+icmpSanjay Patel2019-02-181-36/+45
| | | | | | | | | | | | | | | | | The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487: https://bugs.llvm.org/show_bug.cgi?id=31754 https://bugs.llvm.org/show_bug.cgi?id=40487 ..and those are shown in the IR test file and x86 codegen file. Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use. This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo formation by default. Only x86 overrides that setting for now although other targets will likely benefit by forming usbuo too. Differential Revision: https://reviews.llvm.org/D57789 llvm-svn: 354298
* [CGP] add test for unsigned subtract of 1 with overflow; NFCSanjay Patel2019-02-051-13/+28
| | | | llvm-svn: 353179
* [CGP] Add support for sinking operands to their users, if they are free.Florian Hahn2019-02-051-0/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152
* [CGP] fix bogus test names/comments; NFCSanjay Patel2019-02-041-10/+9
| | | | | | Inverted operand 0 and operand 1. llvm-svn: 353106
* [CGP] add tests for usubo; NFCSanjay Patel2019-02-041-1/+154
| | | | llvm-svn: 353103
* [CGP] use IRBuilder to simplify codeSanjay Patel2019-02-041-36/+36
| | | | | | | | | | | | | | | | This is no-functional-change-intended although there could be intermediate variations caused by a difference in the debug info produced by setting that from the builder's insertion point. I'm updating the IR test file associated with this code just to show that the naming differences from using the builder are visible. The motivation for adding a helper function is that we are likely to extend this code to deal with other overflow ops. llvm-svn: 353056
* [CGP] adjust target constraints for forming uaddoSanjay Patel2019-02-031-5/+4
| | | | | | | | | | | | | | | | | | | There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001
* [PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)Sanjay Patel2019-02-031-20/+25
| | | | | | | | | | | | | | | | | | | | | | | This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 ...but that was overcome in x86 codegen with D57637. That patch also corrects the inc vs. add regressions seen with the previous attempt at this. Still, we want to make this matcher complete, so we can potentially canonicalize the pattern even if it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks. I'm not sure if the Hexagon diff means the test is no longer testing what it intended to test, but that should be solvable in a follow-up. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352998
* [CGP] move test file to prevent bot failuresSanjay Patel2019-02-031-0/+0
| | | | | | | | The test specifiies the triple, so it needs to be in the x86 directory in case a bot has been configured without the x86 target. llvm-svn: 352992
* Lower widenable_conditions in CGPPhilip Reames2019-01-311-0/+93
| | | | | | | | This ensures that if we make it to the backend w/o lowering widenable_conditions first, that we generate correct code. Doing it in CGP - instead of isel - let's us fold control flow before hitting block local instruction selection. Differential Revision: https://reviews.llvm.org/D57473 llvm-svn: 352779
* revert r352766: [PatternMatch] add special-case uaddo matching for ↵Sanjay Patel2019-01-311-25/+20
| | | | | | | | increment-by-one Missed some regression test updates when testing this. llvm-svn: 352769
* [PatternMatch] add special-case uaddo matching for increment-by-oneSanjay Patel2019-01-311-20/+25
| | | | | | | | | | | | | | | | | This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 We were failing to match the canonicalized pattern when it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks, but that's a different problem. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352766
* [CGP] add more tests for uaddo; NFCSanjay Patel2019-01-311-0/+71
| | | | llvm-svn: 352762
* Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""David L. Jones2019-01-311-140/+0
| | | | | | | | This change reverts r351626. The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.) llvm-svn: 352722
* Add a 'dynamic' parameter to the objectsize intrinsicErik Pilkington2019-01-302-11/+11
| | | | | | | | | | | | | | This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
* [CGP] auto-generate complete checks for add overflow tests; NFCSanjay Patel2019-01-281-24/+50
| | | | llvm-svn: 352437
* Reapply "[CGP] Check for existing inttotpr before creating new one"Roman Tereshin2019-01-191-0/+140
| | | | | | Original commit: r351582 llvm-svn: 351626
OpenPOWER on IntegriCloud