summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [LSR] Generate cross iteration indexesSam Parker2019-02-071-15/+9
| | | | | | | | | | | | | | Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 llvm-svn: 353403
* [InstCombine] X | C == C --> (X & ~C) == 0Sanjay Patel2019-02-061-8/+8
| | | | | | | | | | We should canonicalize to one of these forms, and compare-with-zero could be more conducive to follow-on transforms. This also leads to generally better codegen as shown in PR40611: https://bugs.llvm.org/show_bug.cgi?id=40611 llvm-svn: 353313
* [InstCombine] add tests for PR40611 and regenerate checks; NFCSanjay Patel2019-02-061-294/+349
| | | | | | Lots of unrelated diffs here from the newer version of the script. llvm-svn: 353312
* [LoopSimplifyCFG] Do not count dead exit blocks twice, make CFG simplerMax Kazantsev2019-02-061-2/+0
| | | | llvm-svn: 353276
* [HotColdSplit] Do not split out `resume` instructionsVedant Kumar2019-02-052-6/+18
| | | | | | | | | Resumes that are not reachable from a cleanup landing pad are considered to be unreachable. It’s not safe to split them out. rdar://47808235 llvm-svn: 353242
* [InstCombine] limit extracting shuffle transform based on usesSanjay Patel2019-02-051-2/+2
| | | | | | | | | | As discussed in D53037, this can lead to worse codegen, and we don't generally expect the backend to be able to optimize arbitrary shuffles. If there's only one use of the 1st shuffle, that means it's getting removed, so that should always be safe. llvm-svn: 353235
* [InstCombine] split shuffle test to show extra use constraint; NFCSanjay Patel2019-02-051-3/+14
| | | | | | | As discussed in D53037, this transform can cause codegen problems if the 1st shuffle has multiple uses. llvm-svn: 353233
* [CGP] add test for unsigned subtract of 1 with overflow; NFCSanjay Patel2019-02-051-13/+28
| | | | llvm-svn: 353179
* [CGP] Add support for sinking operands to their users, if they are free.Florian Hahn2019-02-051-0/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152
* [LSR] Check SCEV on isZero() after extend. PR40514Max Kazantsev2019-02-051-0/+57
| | | | | | | | | | | | | | | | | | | | | When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has returned false. In the end, in invocation of `InsertFormula`, it asserts that all values there are still not zero constants. However between these two points, it makes some transformations, in particular extends them to wider type. SCEV does not give us guarantee that if `S` is not a constant zero, then `sext(S)` is also not a constant zero. It might have missed some optimizing transforms when it was calculating `S` and then made them when it took `sext`. For example, it may happen if previously optimizing transforms were limited by depth or somehow else. This patch adds a bailout when we may end up with a zero SCEV after extension. Differential Revision: https://reviews.llvm.org/D57565 Reviewed By: samparker llvm-svn: 353136
* Revert "[PATCH] [TargetLibraryInfo] Update run time support for Windows"Evandro Menezes2019-02-044-49/+73
| | | | | | This reverts accidental commit ff5527718d5d3b9966f6e8948866c0dc15ffcf3c. llvm-svn: 353118
* [PATCH] [TargetLibraryInfo] Update run time support for WindowsEvandro Menezes2019-02-044-73/+49
| | | | | | | | | | | | | It seems that the run time for Windows has changed and supports more math functions than before. Since LLVM requires at least VS2015, I assume that this is the run time that would be redistributed with programs built with Clang. Thus, I based this update on the header file `math.h` that accompanies it. This patch addresses the PR40541. Unfortunately, I have no access to a Windows development environment to validate it. llvm-svn: 353114
* [CGP] fix bogus test names/comments; NFCSanjay Patel2019-02-041-10/+9
| | | | | | Inverted operand 0 and operand 1. llvm-svn: 353106
* [CGP] add tests for usubo; NFCSanjay Patel2019-02-041-1/+154
| | | | llvm-svn: 353103
* [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemandedNicolai Haehnle2019-02-041-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The fix added in r352904 is not quite correct, or rather misleading: 1. When the texfailctrl (TFC) argument was non-constant, the fix assumed non-TFE/LWE, which is incorrect. 2. Regardless, this code path cannot even be hit for correct TFE/LWE-enabled calls, because those return a struct. Added a test case for those for completeness. Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf Reviewers: hliao, dstuttard, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57681 llvm-svn: 353097
* [WarnMissedTransforms] Do not warn about already vectorized loops.Michael Kruse2019-02-041-0/+33
| | | | | | | | | | | | | LoopVectorize adds llvm.loop.isvectorized, but leaves llvm.loop.vectorize.enable. Do not consider such a loop for user-forced vectorization since vectorization already happened -- by prioritizing llvm.loop.isvectorized except for TM_SuppressedByUser. Fixes http://llvm.org/PR40546 Differential Revision: https://reviews.llvm.org/D57542 llvm-svn: 353082
* [CGP] use IRBuilder to simplify codeSanjay Patel2019-02-041-36/+36
| | | | | | | | | | | | | | | | This is no-functional-change-intended although there could be intermediate variations caused by a difference in the debug info produced by setting that from the builder's insertion point. I'm updating the IR test file associated with this code just to show that the naming differences from using the builder are visible. The motivation for adding a helper function is that we are likely to extend this code to deal with other overflow ops. llvm-svn: 353056
* Commit tests for changes in revision D41608Dmitry Venikov2019-02-041-0/+91
| | | | llvm-svn: 353037
* [LoopIdiomRecognize] @llvm.dbg values shouldn't affect the transformation.Davide Italiano2019-02-031-0/+68
| | | | | | | | | | | | | | Summary: PR40564 Reviewers: aprantl, rnk Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D57629 llvm-svn: 353007
* [CGP] adjust target constraints for forming uaddoSanjay Patel2019-02-031-5/+4
| | | | | | | | | | | | | | | | | | | There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001
* [PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)Sanjay Patel2019-02-031-20/+25
| | | | | | | | | | | | | | | | | | | | | | | This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 ...but that was overcome in x86 codegen with D57637. That patch also corrects the inc vs. add regressions seen with the previous attempt at this. Still, we want to make this matcher complete, so we can potentially canonicalize the pattern even if it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks. I'm not sure if the Hexagon diff means the test is no longer testing what it intended to test, but that should be solvable in a follow-up. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352998
* [CGP] move test file to prevent bot failuresSanjay Patel2019-02-031-0/+0
| | | | | | | | The test specifiies the triple, so it needs to be in the x86 directory in case a bot has been configured without the x86 target. llvm-svn: 352992
* [InstSimplify] Missed optimization in math expression: log10(pow(10.0,x)) == ↵Dmitry Venikov2019-02-032-12/+4
| | | | | | | | | | | | | | | | x, log2(pow(2.0,x)) == x Summary: This patch enables folding following instructions under -ffast-math flag: log10(pow(10.0,x)) -> x, log2(pow(2.0,x)) -> x Reviewers: hfinkel, spatel, efriedma, craig.topper, zvi, majnemer, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D41940 llvm-svn: 352981
* [LCSSA] Handle case with single new PHI faster.Florian Hahn2019-02-021-32/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is only a single available value, all uses must be dominated by the single value and there is no need to search for a reaching definition. This drastically speeds up LCSSA in some cases. For the test case from PR37202, it speeds up LCSSA construction by 4 times. Time-passes without this patch for test case from PR37202: Total Execution Time: 29.9285 seconds (29.9276 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 5.2786 ( 17.7%) 0.0021 ( 1.2%) 5.2806 ( 17.6%) 5.2808 ( 17.6%) Unswitch loops 4.3739 ( 14.7%) 0.0303 ( 18.1%) 4.4042 ( 14.7%) 4.4042 ( 14.7%) Loop-Closed SSA Form Pass 4.2658 ( 14.3%) 0.0192 ( 11.5%) 4.2850 ( 14.3%) 4.2851 ( 14.3%) Loop-Closed SSA Form Pass #2 2.2307 ( 7.5%) 0.0013 ( 0.8%) 2.2320 ( 7.5%) 2.2318 ( 7.5%) Loop Invariant Code Motion 2.0888 ( 7.0%) 0.0012 ( 0.7%) 2.0900 ( 7.0%) 2.0897 ( 7.0%) Unroll loops 1.6761 ( 5.6%) 0.0013 ( 0.8%) 1.6774 ( 5.6%) 1.6774 ( 5.6%) Value Propagation 1.3686 ( 4.6%) 0.0029 ( 1.8%) 1.3716 ( 4.6%) 1.3714 ( 4.6%) Induction Variable Simplification 1.1457 ( 3.8%) 0.0010 ( 0.6%) 1.1468 ( 3.8%) 1.1468 ( 3.8%) Loop-Closed SSA Form Pass #4 1.1384 ( 3.8%) 0.0005 ( 0.3%) 1.1389 ( 3.8%) 1.1389 ( 3.8%) Loop-Closed SSA Form Pass #6 1.1360 ( 3.8%) 0.0027 ( 1.6%) 1.1387 ( 3.8%) 1.1387 ( 3.8%) Loop-Closed SSA Form Pass #5 1.1331 ( 3.8%) 0.0010 ( 0.6%) 1.1341 ( 3.8%) 1.1340 ( 3.8%) Loop-Closed SSA Form Pass #3 Time passes with this patch Total Execution Time: 19.2802 seconds (19.2813 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 4.4234 ( 23.2%) 0.0038 ( 2.0%) 4.4272 ( 23.0%) 4.4273 ( 23.0%) Unswitch loops 2.3828 ( 12.5%) 0.0020 ( 1.1%) 2.3848 ( 12.4%) 2.3847 ( 12.4%) Unroll loops 1.8714 ( 9.8%) 0.0020 ( 1.1%) 1.8734 ( 9.7%) 1.8735 ( 9.7%) Loop Invariant Code Motion 1.7973 ( 9.4%) 0.0022 ( 1.2%) 1.7995 ( 9.3%) 1.8003 ( 9.3%) Value Propagation 1.4010 ( 7.3%) 0.0033 ( 1.8%) 1.4043 ( 7.3%) 1.4044 ( 7.3%) Induction Variable Simplification 0.9978 ( 5.2%) 0.0244 ( 13.1%) 1.0222 ( 5.3%) 1.0224 ( 5.3%) Loop-Closed SSA Form Pass #2 0.9611 ( 5.0%) 0.0257 ( 13.8%) 0.9868 ( 5.1%) 0.9868 ( 5.1%) Loop-Closed SSA Form Pass 0.5856 ( 3.1%) 0.0015 ( 0.8%) 0.5871 ( 3.0%) 0.5869 ( 3.0%) Unroll loops #2 0.4132 ( 2.2%) 0.0012 ( 0.7%) 0.4145 ( 2.1%) 0.4143 ( 2.1%) Loop Invariant Code Motion #3 Reviewers: efriedma, davide, mzolotukhin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D57033 llvm-svn: 352960
* [InstCombine] Refactor test checks (NFC)Evandro Menezes2019-02-011-198/+198
| | | | llvm-svn: 352935
* [Test] Update file w/update_test_checks.py to make a follow on change obviousPhilip Reames2019-02-011-29/+29
| | | | llvm-svn: 352932
* [InstCombine] Expand Windows test (NFC)Evandro Menezes2019-02-011-52/+66
| | | | | | Run checks for Win32 as well. llvm-svn: 352917
* [InstCombine] Expand Windows test (NFC)Evandro Menezes2019-02-011-21/+26
| | | | | | Run checks for Win64 as well. llvm-svn: 352908
* [InstCombine] Extra null-checking on TFE/LWE supportMichael Liao2019-02-011-0/+7
| | | | | | | | - If that operand is not ConstantInt, skip enabling TFE/LWE. Differential Revision: https://reviews.llvm.org/D57539 llvm-svn: 352904
* [InstCombine] Refactor test checks (NFC)Evandro Menezes2019-02-011-16/+13
| | | | llvm-svn: 352895
* [InstCombine] Expand Windows test (NFC)Evandro Menezes2019-02-011-15/+44
| | | | | | Add checks for Win64 to existing cases. llvm-svn: 352892
* [InstCombine] Refactor test checks (NFC)Evandro Menezes2019-02-011-42/+24
| | | | llvm-svn: 352886
* [InstCombine] try to reduce x86 addcarry to generic uaddo intrinsicSanjay Patel2019-02-011-10/+12
| | | | | | | | | | | | | | | | | | | If we can reduce the x86-specific intrinsic to the generic op, it allows existing simplifications and value tracking folds. AFAICT, this always results in identical x86 codegen in the non-reduced case...which should be true because we semi-generically (too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must already combine/lower this intrinsic as expected. This isn't quite what was requested in: https://bugs.llvm.org/show_bug.cgi?id=40486 ...but we want to have these kinds of folds early for efficiency and to enable greater simplifications. For the case in the bug report where we have: _addcarry_u64(0, ahi, 0, &ahi) ...this gets completely simplified away in IR. Differential Revision: https://reviews.llvm.org/D57453 llvm-svn: 352870
* Provide reason messages for unviable inliningYevgeny Rouban2019-02-011-0/+10
| | | | | | | | | | | | | InlineCost's isInlineViable() is changed to return InlineResult instead of bool. This provides messages for failure reasons and allows to get more specific messages for cases where callsites are not viable for inlining. Reviewed By: xbolva00, anemet Differential Revision: https://reviews.llvm.org/D57089 llvm-svn: 352849
* Lower widenable_conditions in CGPPhilip Reames2019-01-311-0/+93
| | | | | | | | This ensures that if we make it to the backend w/o lowering widenable_conditions first, that we generate correct code. Doing it in CGP - instead of isel - let's us fold control flow before hitting block local instruction selection. Differential Revision: https://reviews.llvm.org/D57473 llvm-svn: 352779
* Recommit "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT ↵Teresa Johnson2019-01-312-0/+42
| | | | | | | | leader" Recommit of r352763 with fix for use after free. llvm-svn: 352770
* revert r352766: [PatternMatch] add special-case uaddo matching for ↵Sanjay Patel2019-01-311-25/+20
| | | | | | | | increment-by-one Missed some regression test updates when testing this. llvm-svn: 352769
* Revert "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader"Teresa Johnson2019-01-312-42/+0
| | | | | | | | | | | | This reverts commit r352763. Causing a couple bot failures, root cause pointed to by sanitizer bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/28909/steps/annotate/logs/stdio Use after free. I understand the issue but will revert and test with fix before recommitting. llvm-svn: 352768
* [PatternMatch] add special-case uaddo matching for increment-by-oneSanjay Patel2019-01-311-20/+25
| | | | | | | | | | | | | | | | | This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 We were failing to match the canonicalized pattern when it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks, but that's a different problem. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352766
* [ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leaderTeresa Johnson2019-01-312-0/+42
| | | | | | | | | | | | | | | | | | Summary: COFF requires that COMDAT name match that of the leader. When we promote and rename an internal leader in ThinLTO due to an import, ensure we subsequently rename the associated COMDAT. Similar to D31963 which did this during ThinLTO module splitting. Fixes PR40414. Reviewers: pcc, inglorion Subscribers: mehdi_amini, dexonsmith, dmajor, llvm-commits Differential Revision: https://reviews.llvm.org/D57395 llvm-svn: 352763
* [CGP] add more tests for uaddo; NFCSanjay Patel2019-01-311-0/+71
| | | | llvm-svn: 352762
* Default lowering for experimental.widenable.conditionMax Kazantsev2019-01-311-0/+44
| | | | | | | | | | | Introduces a pass that provides default lowering strategy for the `experimental.widenable.condition` intrinsic, replacing all its uses with `i1 true`. Differential Revision: https://reviews.llvm.org/D56096 Reviewed By: reames llvm-svn: 352739
* Commit tests for changes in revision D41940Dmitry Venikov2019-01-312-0/+98
| | | | llvm-svn: 352734
* [InstCombine] Missed optimization in math expression: simplify calls exp ↵Dmitry Venikov2019-01-312-30/+22
| | | | | | | | | | | | | | | | functions Summary: This patch enables folding following expressions under -ffast-math flag: exp(X) * exp(Y) -> exp(X + Y), exp2(X) * exp2(Y) -> exp2(X + Y). Motivation: https://bugs.llvm.org/show_bug.cgi?id=35594 Reviewers: hfinkel, spatel, efriedma, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D41342 llvm-svn: 352730
* [SCEV] Prohibit SCEV transformations for huge SCEVsMax Kazantsev2019-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | Currently SCEV attempts to limit transformations so that they do not work with big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics such as recursion depth and number of operands, which do not give us a guarantee that we don't actually have big SCEVs. This situation is still possible, though it is not likely to happen. However, the bug PR33494 showed a bunch of simple corner case tests where we still produce huge SCEVs, even not reaching big recursion depth etc. This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing transformations if any of SCEVs we are dealing with is huge. This gives us a reliable check that we don't spend too much time working with them. As the next step, we can possibly get rid of old limiting mechanisms, such as recursion depth thresholds. Differential Revision: https://reviews.llvm.org/D35990 Reviewed By: reames llvm-svn: 352728
* Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""David L. Jones2019-01-311-140/+0
| | | | | | | | This change reverts r351626. The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.) llvm-svn: 352722
* [InstCombine] Expand testing for Windows (NFC)Evandro Menezes2019-01-311-48/+67
| | | | | | Added the checks to the existing cases when the target is Win64. llvm-svn: 352714
* [InstCombine] Simplify check clauses in test (NFC)Evandro Menezes2019-01-311-71/+57
| | | | llvm-svn: 352707
* Add a 'dynamic' parameter to the objectsize intrinsicErik Pilkington2019-01-309-60/+143
| | | | | | | | | | | | | | This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
* [Tests] Add tests for propagation of undef elements in vector GEPsPhilip Reames2019-01-301-0/+25
| | | | llvm-svn: 352662
OpenPOWER on IntegriCloud