summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [LoopSimplify] Preserve LCSSA when removing edges from unreachable blocks.Michael Zolotukhin2016-11-181-0/+32
| | | | | | This fixes PR30454. llvm-svn: 287379
* [simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations.Florian Hahn2016-11-182-0/+95
| | | | | | | | | | | | | insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now propagates existing llvm.loop metadata to newly the added backedge. llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp now propagates existing llvm.loop metadata to the branch instructions in the predecessor blocks of the empty block that is removed. Differential Revision: https://reviews.llvm.org/D26495 llvm-svn: 287341
* [InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics ↵Craig Topper2016-11-181-0/+384
| | | | | | | | for variable shift with 16-bit elements. This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches. llvm-svn: 287316
* [AVX-512] Support FCOPYSIGN for v16f32 and v8f64Craig Topper2016-11-181-20/+34
| | | | | | | | | | | | | | | Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298
* Use profile info to adjust loop unroll threshold.Dehao Chen2016-11-171-0/+59
| | | | | | | | | | | | | | Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186
* Introduce GlobalSplit pass.Peter Collingbourne2016-11-163-0/+109
| | | | | | | | | This pass splits globals into elements using inrange annotations on getelementptr indices. Differential Revision: https://reviews.llvm.org/D22295 llvm-svn: 287178
* [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmulCraig Topper2016-11-162-88/+0
| | | | | | | | | | | | Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083
* Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.Vyacheslav Klochkov2016-11-162-2/+39
| | | | | | | Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26575 llvm-svn: 287064
* [BypassSlowDivision] Handle division by constant numerators better.Justin Lebar2016-11-161-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062
* Revert r286999 which caused buildbot test failures. Some testcases need to ↵Wei Mi2016-11-151-65/+0
| | | | | | be made target specific. llvm-svn: 287014
* [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.Wei Mi2016-11-151-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 286999
* [IndVars] Change the order to compute WidenAddRec in widenIVUse.Wei Mi2016-11-152-2/+63
| | | | | | | | | | | | | | | | When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence return non-null but different WideAddRec, if getWideRecurrence is called before getExtendedOperandRecurrence, we won't bother to call getExtendedOperandRecurrence again. But As we know it is possible that after SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null WideAddRec, we know for sure that it is legal to do widening for current instruction. So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which will increase the chance of successful widening. Differential Revision: https://reviews.llvm.org/D26059 llvm-svn: 286987
* [InstCombine] add tests for bitcasted selects; NFCSanjay Patel2016-11-151-0/+84
| | | | llvm-svn: 286978
* Revert "[JumpThreading] Unfold selects that depend on the same condition"Pablo Barrio2016-11-151-45/+0
| | | | | | This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc. llvm-svn: 286976
* [LoopVectorizer] When estimating reg usage, unused insts may "end" another useRobert Lougher2016-11-152-1/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 llvm-svn: 286969
* [InstCombine] add tests to show missing bitcast foldsSanjay Patel2016-11-141-0/+43
| | | | llvm-svn: 286900
* [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargetsSimon Pilgrim2016-11-141-2/+4
| | | | | | | | Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
* [InlineCost] Remove skew when calculating call costsJames Molloy2016-11-149-11/+38
| | | | | | | | | | | | | | | When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself. However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case: int g() { h(); } int f() { g(); } llvm-svn: 286814
* [InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked ↵Craig Topper2016-11-131-0/+326
| | | | | | AVX-512 variable shift intrinsics. llvm-svn: 286755
* [InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 ↵Craig Topper2016-11-131-0/+808
| | | | | | | | shift by immediate and shift by single value. This does not include support for the AVX-512 variable shifts. That will be coming in a future patch. llvm-svn: 286739
* [InstCombine] update test to use FileCheck; NFCSanjay Patel2016-11-111-12/+16
| | | | llvm-svn: 286668
* [LV] Stop saying "use -Rpass-analysis=loop-vectorize"Adam Nemet2016-11-116-11/+11
| | | | | | | | | | | | | | | | | | This is PR28376. Unfortunately given the current structure of optimization diagnostics we lack the capability to tell whether the user has passed -Rpass-analysis=loop-vectorize since this is local to the front-end (BackendConsumer::OptimizationRemarkHandler). So rather than printing this even if the user has already passed -Rpass-analysis, this patch just punts and stops recommending this option. I don't think that getting this right is worth the complexity. Differential Revision: https://reviews.llvm.org/D26563 llvm-svn: 286662
* [cfi] Fix weak functions handling.Evgeniy Stepanov2016-11-111-0/+67
| | | | | | | | | | | | | | | When a function pointer is replaced with a jumptable pointer, special case is needed to preserve the semantics of extern_weak functions. Since a jumptable entry can not be extern_weak, we emulate that behaviour by replacing all references to F (the extern_weak function) with the following expression: F != nullptr ? JumpTablePtr : nullptr. Extra special care is needed for global initializers, since most (or probably all) backends can not lower an initializer that includes this kind of constant expression. Initializers like that are replaced with a global constructor (i.e. a runtime initializer). llvm-svn: 286636
* Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer.Vyacheslav Klochkov2016-11-111-0/+52
| | | | | | | Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26543 llvm-svn: 286626
* [InstCombine] add tests to show size-increasing select transformsSanjay Patel2016-11-111-0/+46
| | | | llvm-svn: 286619
* [cfi] Implement cfi-icall using inline assembly.Evgeniy Stepanov2016-11-114-24/+93
| | | | | | | | | | | | | | | | | | | | | | | | | The current implementation is emitting a global constant that happens to evaluate to the same bytes + relocation as a jump instruction on X86. This does not work for PIE executables and shared libraries though, because we end up with a wrong relocation type. And it has no chance of working on ARM/AArch64 which use different relocation types for jump instructions (R_ARM_JUMP24) that is never generated for data. This change replaces the constant with module-level inline assembly followed by a hidden declaration of the jump table. Works fine for ARM/AArch64, but has some drawbacks. * Extra symbols are added to the static symbol table, which inflate the size of the unstripped binary a little. Stripped binaries are not affected. This happens because jump table declarations must be external (because their body is in the inline asm). * Original functions that were anonymous are now named <original name>.cfi, and it affects symbolization sometimes. This is necessary because the only user of these functions is the (inline asm) jump table, so they had to be added to @llvm.used, which does not allow unnamed functions. llvm-svn: 286611
* [OptDiag] Remove non-printable chars from function nameAdam Nemet2016-11-101-1/+1
| | | | | | | The r283656 did this in the remark arguments. We also need to do this in the main function attribute as that is written to YAML as well. llvm-svn: 286482
* [InstCombine] auto-generate better checks; NFCSanjay Patel2016-11-101-30/+67
| | | | | | | | Note that the existing metadata checking was re-added by hand because the script doesn't currently know how to generate checks for lines outside of functions. llvm-svn: 286460
* [InstCombine] avoid infinite loop from shuffle-extract-insert sequence (PR30923)Sanjay Patel2016-11-101-0/+27
| | | | | | | | | | | | Removing the limitation in visitInsertElementInst() causes several regressions because we're not prepared to fold sequences of shuffles or inserts and extracts separated by shuffles. Fixing that appears to be a difficult mission because we are purposely trying to avoid creating shuffles with arbitrary shuffle masks because some targets may choke on those. https://llvm.org/bugs/show_bug.cgi?id=30923 llvm-svn: 286423
* Update vectorization debug info unittest.Dehao Chen2016-11-091-11/+11
| | | | | | | | | | | | | | Summary: The change will test the change in r286159. The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location. Reviewers: probinson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26428 llvm-svn: 286403
* [InstCombine] regenerate checks; NFCSanjay Patel2016-11-091-159/+185
| | | | llvm-svn: 286402
* [InstCombine] regenerate checks; NFCSanjay Patel2016-11-091-51/+83
| | | | llvm-svn: 286399
* [ARM] Loop Strength Reduction crashes when targeting ARM or Thumb.Alexandros Lamprineas2016-11-091-0/+35
| | | | | | | | | | | Scalar Evolution asserts when not all the operands of an Add Recurrence Expression are loop invariants. Loop Strength Reduction should only create affine Add Recurrences, so that both the start and the step of the expression are loop invariants. Differential Revision: https://reviews.llvm.org/D26185 llvm-svn: 286347
* Enable Loop Sink pass for functions that has profile.Dehao Chen2016-11-092-6/+8
| | | | | | | | | | | | Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code. Reviewers: davidxl, hfinkel Subscribers: mehdi_amini, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26155 llvm-svn: 286325
* [InstCombine] fix profitability equation for max-of-nots transformSanjay Patel2016-11-091-5/+7
| | | | | | | | | | As the test change shows, we can increase the critical path by adding a 'not' instruction, so make sure that we're actually removing an instruction if we do this transform. This transform could also cause us to miss folds of min/max pairs. llvm-svn: 286315
* [LibcallsShrinkWrap] This pass doesn't preserve the CFG.Davide Italiano2016-11-081-0/+11
| | | | | | | | | | For example, it invalidates the domtree, causing assertions in later passes which need dominator infos. Make it preserve GlobalsAA, as suggested by Eli. Differential Revision: https://reviews.llvm.org/D26381 llvm-svn: 286271
* [InstCombine] move min/max tests to min/max test file; NFCSanjay Patel2016-11-082-142/+142
| | | | llvm-svn: 286256
* [InstCombine] update checks; NFCSanjay Patel2016-11-081-8/+9
| | | | llvm-svn: 286255
* [JumpThreading] Unfold selects that depend on the same conditionPablo Barrio2016-11-081-0/+45
| | | | | | | | | | | | | | | | | | Summary: These are good candidates for jump threading. This enables later opts (such as InstCombine) to combine instructions from the selects with instructions out of the selects. SimplifyCFG will fold the select again if unfolding wasn't worth it. Patch by James Molloy and Pablo Barrio. Reviewers: rengolin, haicheng, sebpop Subscribers: jojo, jmolloy, llvm-commits Differential Revision: https://reviews.llvm.org/D26391 llvm-svn: 286236
* [VectorLegalizer] Expansion of CTLZ using CTPOP when possibleSimon Pilgrim2016-11-081-550/+52
| | | | | | | | | | This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233
* [OptDiag, opt-viewer] Save callee's location and display as linkAdam Nemet2016-11-072-0/+6
| | | | | | | | | | | | | With this we get a new field in the YAML record if the value being streamed out has a debug location. For examples, please see the changes to the tests. This is then used in opt-viewer to display a link for the callee function in the inlining remarks. Differential Revision: https://reviews.llvm.org/D26366 llvm-svn: 286169
* Avoid tail recursion elimination across calls with operand bundlesSanjoy Das2016-11-071-0/+57
| | | | | | | | | | | | | | | | | | | | Summary: In some specific scenarios with well understood operand bundle types (like `"deopt"`) it may be possible to go ahead and convert recursion to iteration, but TailRecursionElimination does not have that logic today so avoid doing the right thing for now. I need some input on whether `"funclet"` operand bundles should also block tail recursion elimination. If not, I'll allow TRE across calls with `"funclet"` operand bundles and add a test case. Reviewers: rnk, majnemer, nlewycky, ahatanak Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26270 llvm-svn: 286147
* [MemCpyOpt] Don't emit IR in an unspecified orderBenjamin Kramer2016-11-071-28/+28
| | | | | | | | | Argument evaluation order is one of the edge cases where Clang differs from GCC, yielding different IR depending on which compiler LLVM was built with. Make the order deterministic and tune the test to actually verify the order instead of trying to hide it. llvm-svn: 286126
* [InstCombine] allow splat vector folds in adjustMinMax() (retry r285732)Sanjay Patel2016-11-071-48/+45
| | | | | | | | This was reverted at r285866 because there was a crash handling a scalar select of vectors. I added a check for that pattern and a test case based on the example provided in the post-commit thread for r285732. llvm-svn: 286113
* llvm/test/Transforms/DCE/calls-errno.ll: Suppress checking @pow(+0,-1).NAKAMURA Takumi2016-11-041-2/+2
| | | | | | It depends on host's pow(3), and mingw's pow doesn't raise any errors, just returns +INF. llvm-svn: 286005
* Revert "[InstCombine] allow splat vector folds in adjustMinMax()"Greg Bedwell2016-11-021-32/+48
| | | | | | | | | | | | | | | | | This reverts commit r285732. This change introduced a new assertion failure in the following testcase at -O2: typedef short __v8hi __attribute__((__vector_size__(16))); __v8hi foo(__v8hi &V1, __v8hi &V2, unsigned mask) { __v8hi Result = V1; if (mask & 0x80) Result[0] = V2[0]; return Result; } llvm-svn: 285866
* DCE math library calls with a constant operand.Eli Friedman2016-11-021-0/+91
| | | | | | | | | | | On platforms which use -fmath-errno, math libcalls without any uses require some extra checks to figure out if they are actually dead. Fixes https://llvm.org/bugs/show_bug.cgi?id=30464 . Differential Revision: https://reviews.llvm.org/D25970 llvm-svn: 285857
* [Reassociate] Skip analysis of dead code to avoid infinite loop.Bjorn Pettersson2016-11-021-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | Summary: It was detected that the reassociate pass could enter an inifite loop when analysing dead code. Simply skipping to analyse basic blocks that are dead avoids such problems (and as a side effect we avoid spending time on optimising dead code). The solution is using the same Reverse Post Order ordering of the basic blocks when doing the optimisations, as when building the precalculated rank map. A nice side-effect of this solution is that we now know that we only try to do optimisations for blocks with ranked instructions. Fixes https://llvm.org/bugs/show_bug.cgi?id=30818 Reviewers: llvm-commits, davide, eli.friedman, mehdi_amini Subscribers: dberlin Differential Revision: https://reviews.llvm.org/D26154 llvm-svn: 285793
* [InstCombine] allow splat vector folds in adjustMinMax()Sanjay Patel2016-11-011-48/+32
| | | | llvm-svn: 285732
* [InstCombine] Fold nuw left-shifts in `ugt`/`ule` comparisons.Sanjay Patel2016-11-011-14/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This transforms %a = shl nuw %x, c1 %b = icmp {ugt|ule} %a, c0 into %b = icmp {ugt|ule} %x, (c0 >> c1) z3: (declare-const x (_ BitVec 64)) (declare-const c0 (_ BitVec 64)) (declare-const c1 (_ BitVec 64)) (push) (assert (= x (bvlshr (bvshl x c1) c1))) ; nuw (assert (not (= (bvugt (bvshl x c1) c0) (bvugt x (bvlshr c0 c1))))) (check-sat) (get-model) (pop) (push) (assert (= x (bvlshr (bvshl x c1) c1))) ; nuw (assert (not (= (bvule (bvshl x c1) c0) (bvule x (bvlshr c0 c1))))) (check-sat) (get-model) (pop) Patch by bryant! Differential Revision: https://reviews.llvm.org/D25913 llvm-svn: 285729
OpenPOWER on IntegriCloud