summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Update MemorySSA in SimpleLoopUnswitch.Alina Sbirlea2018-12-0431-0/+31
| | | | | | | | | | | Summary: Teach SimpleLoopUnswitch to preserve MemorySSA. Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D47022 llvm-svn: 348263
* [CodeExtractor] Split PHI nodes with incoming values from outlined region ↵Vedant Kumar2018-12-031-2/+2
| | | | | | | | | | | | | | | | | (PR39433) If a PHI node out of extracted region has multiple incoming values from it, split this PHI on two parts. First PHI has incomings only from region and extracts with it (they are placed to the separate basic block that added to the list of outlined), and incoming values in original PHI are replaced by first PHI. Similar solution is already used in CodeExtractor for PHIs in entry block (severSplitPHINodes method). It covers PR39433 bug. Patch by Sergei Kachkov! Differential Revision: https://reviews.llvm.org/D55018 llvm-svn: 348205
* [InstCombine] fix undef propagation bug with shuffle+binopSanjay Patel2018-12-031-6/+6
| | | | | | | | | | | | | | When we have a shuffle that extends a source vector with undefs and then do some binop on that, we must make sure that the extra elements remain undef with that binop if we reverse the order of the binop and shuffle. 'or' is probably the easiest example to show the bug because 'or C, undef --> -1' (not undef). But there are other opcode/constant combinations where this is true as shown by the 'shl' test. llvm-svn: 348191
* [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.Roman Lebedev2018-12-032-8/+12
| | | | | | | | | | | These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: https://rise4fun.com/Alive/9au https://rise4fun.com/Alive/UcQM Fixes https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348181
* [InstCombine] add tests for shuffle+binop fold; NFCSanjay Patel2018-12-031-2/+58
| | | | llvm-svn: 348173
* [SimplifyCFG] add tests for cross block compare folding; NFCSanjay Patel2018-12-031-0/+259
| | | | | | | | | These are the baseline tests for D54827. Patch based on code originally written by: @yinyuefengyi (luo xionghu) Differential Revision: https://reviews.llvm.org/D54994 llvm-svn: 348151
* [test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSDMichal Gorny2018-12-023-9/+9
| | | | | | | | | | | | Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required for sort implementation on NetBSD. The '-b' modifier is ineffective if specified without any key. Per the manpage: Note that the -b option has no effect unless key fields are specified. Differential Revision: https://reviews.llvm.org/D55168 llvm-svn: 348097
* [TTI] Reduction costs only need to include a single extract element cost ↵Simon Pilgrim2018-12-014-301/+188
| | | | | | | | | | | | | | | | (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076
* [InstCombine] Support ssub.sat canonicalization for non-splatsNikita Popov2018-12-011-5/+4
| | | | | | | | | | | | Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision: https://reviews.llvm.org/D55011 llvm-svn: 348072
* [ThinLTO] Allow importing of functions with var argsTeresa Johnson2018-12-011-6/+6
| | | | | | | | | | | | | | | | | | Summary: Follow up to D54270, which allowed importing of var args functions unless they called va_start. As pointed out in the post-commit comments on that patch, the inliner can handle functions that call va_start in certain situations as well. Go ahead and enable importing of all var args functions. Measurements on a large binary show that this increases imports and binary size by an insignificant amount. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54607 llvm-svn: 348068
* [X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in ↵Craig Topper2018-12-013-3/+3
| | | | | | some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512. llvm-svn: 348063
* [X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover ↵Craig Topper2018-11-3014-368/+493
| | | | | | | | -mprefer=vector-width=256 and -mprefer-vector-width=512. This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors. llvm-svn: 348046
* [InstSimplify] add tests for undef + partial undef constant folding; NFCSanjay Patel2018-11-301-0/+80
| | | | | | | | These tests should probably go under a separate test file because they should fold with just -constprop, but they're similar to the scalar tests already in here. llvm-svn: 348045
* [Mem2Reg] Fix nondeterministic corner caseJoseph Tremoulet2018-11-301-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | Summary: When mem2reg inserts phi nodes in blocks with unreachable predecessors, it adds undef operands for those incoming edges. When there are multiple such predecessors, the order is currently based on the address of the BasicBlocks. This change fixes that by using the BBNumbers in the sort/search predicates, as is done elsewhere in mem2reg to ensure determinism. Also adds a testcase with a bunch of unreachable preds, which (nodeterministically) fails without the fix. Reviewers: majnemer Reviewed By: majnemer Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D55077 llvm-svn: 348024
* [SLP]PR39774: Update references of the replaced external instructions.Alexey Bataev2018-11-304-6/+95
| | | | | | | | | | | | | | | Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55017 llvm-svn: 347997
* Add a new reduction pattern matchRenato Golin2018-11-301-0/+821
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adding a new reduction pattern match for vectorizing code similar to TSVC s3111: for (int i = 0; i < N; i++) if (a[i] > b) sum += a[i]; This patch adds support for fadd, fsub and fmull, as well as multiple branches and different (but compatible) instructions (ex. add+sub) in different branches. The difference from the previous patch(https://reviews.llvm.org/D49168) is as follows: - Added check of fast-math property of fp-instruction to the previous patch - Fix/add some pattern for if-reduction.ll Differential Revision: https://reviews.llvm.org/D54464 Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org> and Masakazu Ueno <masakazu.ueno@linaro.org> llvm-svn: 347989
* [LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783Max Kazantsev2018-11-301-1/+0
| | | | | | | | | | | | | | | | Terminator folding transform lacks MemorySSA update for memory Phis, while they exist within MemorySSA analysis. They need exactly the same type of updates as regular Phis. Failing to update them properly ends up with inconsistent MemorySSA and manifests in various assertion failures. This patch adds Memory Phi updates to this transform. Thanks to @jonpa for finding this! Differential Revision: https://reviews.llvm.org/D55050 Reviewed By: asbirlea llvm-svn: 347979
* [NFC] Simplify and reduce tests for PR39783Max Kazantsev2018-11-303-279/+111
| | | | llvm-svn: 347976
* [SCEV] Guard movement of insertion point for loop-invariantsWarren Ristow2018-11-301-0/+65
| | | | | | | | | | | | | | | r320789 suppressed moving the insertion point of SCEV expressions with dev/rem operations to the loop header in non-loop-invariant situations. This, and similar, hoisting is also unsafe in the loop-invariant case, since there may be a guard against a zero denominator. This is an adjustment to the fix of r320789 to suppress the movement even in the loop-invariant case. This fixes PR30806. Differential Revision: https://reviews.llvm.org/D54713 llvm-svn: 347934
* Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"David Stuttard2018-11-291-23/+0
| | | | | | | | | Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
* [InstSimplify] fold select with implied conditionSanjay Patel2018-11-292-18/+8
| | | | | | | | | | | | | | | | | | | | | | | This is an almost direct move of the functionality from InstCombine to InstSimplify. There's no reason not to do this in InstSimplify because we never create a new value with this transform. (There's a question of whether any dominance-based transform belongs in either of these passes, but that's a separate issue.) I've changed 1 of the conditions for the fold (1 of the blocks for the branch must be the block we started with) into an assert because I'm not sure how that could ever be false. We need 1 extra check to make sure that the instruction itself is in a basic block because passes other than InstCombine may be using InstSimplify as an analysis on values that are not wired up yet. The 3-way compare changes show that InstCombine has some kind of phase-ordering hole. Otherwise, we would have already gotten the intended final result that we now show here. llvm-svn: 347896
* [LICM] Reapply r347776 "Make LICM able to hoist phis" with fixJohn Brawn2018-11-291-0/+1351
| | | | | | | | | | This commit caused a large compile-time slowdown in some cases when NDEBUG is off due to the dominator tree verification it added. Fix this by only doing dominator tree and loop info verification when something has been hoisted. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347889
* [SimplifyCFG] auto-generate complete checks; NFCSanjay Patel2018-11-291-42/+109
| | | | llvm-svn: 347882
* [InstCombine] auto-generate complete checks; NFCSanjay Patel2018-11-291-42/+116
| | | | llvm-svn: 347881
* [CallSiteSplitting] Report edge deletion to DomTreeUpdaterJoseph Tremoulet2018-11-291-0/+29
| | | | | | | | | | | | | | | | | | | Summary: When splitting musttail calls, the split blocks' original terminators get removed; inform the DTU when this happens. Also add a testcase that fails an assertion in the DTU without this fix. Reviewers: fhahn, junbuml Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55027 llvm-svn: 347872
* Add support for TFE/LWE in image intrinsicsDavid Stuttard2018-11-291-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
* Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply ↵Martin Storsjo2018-11-292-1362/+9
| | | | | | | | | | | r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details. llvm-svn: 347867
* [CVP] auto-generate complete test checks; NFCSanjay Patel2018-11-295-289/+1009
| | | | llvm-svn: 347866
* [NFC] Add two XFAIL tests from PR39783Max Kazantsev2018-11-292-0/+279
| | | | llvm-svn: 347845
* [LoopStrengthReduce] ComplexityLimit as an optionSam Parker2018-11-292-0/+120
| | | | | | | | Convert ComplexityLimit into a command line value. Differential Revision: https://reviews.llvm.org/D54899 llvm-svn: 347843
* [Inliner] Modify the merging of min-legal-vector-width attribute to better ↵Craig Topper2018-11-291-1/+16
| | | | | | | | | | handle when the caller or callee don't have the attribute. Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute. If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute. llvm-svn: 347841
* [Inliner] Add test for merging of min-legal-vector-width function attribute.Craig Topper2018-11-291-0/+29
| | | | | | This should have been added in r337844, but apparently was I failed to 'git add' the file. llvm-svn: 347840
* [DebugInfo] IR/Bitcode changes for DISubprogram flags.Paul Robinson2018-11-281-1/+1
| | | | | | | | | Packing the flags into one bitcode word will save effort in adding new flags in the future. Differential Revision: https://reviews.llvm.org/D54755 llvm-svn: 347806
* [DebugInfo] Give inlinable calls DILocs (PR39807)Jeremy Morse2018-11-281-0/+43
| | | | | | | | | | | | | | | | | | | | In PR39807 we incorrectly handle circumstances where calls are common'd from conditional blocks into the parent BB. Calls that can be inlined must always have DebugLocs, however we strip them during commoning, which the IR verifier asserts on. Fix this by using applyMergedLocation: it will perform the same DebugLoc stripping of conditional Locs, but will also generate an unknown location DebugLoc that satisfies the requirement for inlinable calls to always have locations. Some of the prior logic for selecting a DebugLoc is now likely redundant; I'll generate a follow-up to remove it (involves editing more regression tests). Differential Revision: https://reviews.llvm.org/D54997 llvm-svn: 347782
* [LICM] Enable control flow hoisting by defaultJohn Brawn2018-11-282-11/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D54949 llvm-svn: 347778
* [LICM] Reapply r347190 "Make LICM able to hoist phis" with fixJohn Brawn2018-11-281-0/+1351
| | | | | | | | | | | | | | | This commit caused failures because it failed to correctly handle cases where we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We need to make sure that we rehoist the use to _after_ the hoisted phi, which we do by always rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader. An option is also added to control whether control flow is hoisted, which is off in this commit but will be turned on in a subsequent commit. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347776
* [InstCombine] Combine saturating add/sub with constant operandsNikita Popov2018-11-281-52/+30
| | | | | | | | | | | | | | | | | Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773
* [InstCombine] Canonicalize ssub.sat to sadd.satNikita Popov2018-11-281-28/+28
| | | | | | | | | Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772
* [ValueTracking] Determine always-overflow condition for unsigned subNikita Popov2018-11-281-6/+2
| | | | | | | | | | | | Always-overflow was already determined for unsigned addition, but not subtraction. This patch establishes parity. This allows us to perform some additional simplifications for signed saturating subtractions. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347771
* [InstCombine] Use known overflow information for saturating add/subNikita Popov2018-11-281-18/+14
| | | | | | | | | | | | | If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347770
* [InstCombine] Canonicalize const arg for saturating addsNikita Popov2018-11-281-6/+6
| | | | | | | | | If a saturating add intrinsic has one constant argument, make sure it is on the RHS. This will simplify further transformations. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347769
* [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.Alexey Bataev2018-11-281-0/+108
| | | | | | | | | | | | | | | | | | | Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759
* [InstCombine] Add tests for saturating add/sub; NFCNikita Popov2018-11-271-0/+669
| | | | | | These are baseline tests for D54534. llvm-svn: 347700
* [PartialInliner] Make PHIs free in cost computation.Florian Hahn2018-11-271-0/+40
| | | | | | | | | | | | | | | | InlineCost also treats them as free and the current implementation can cause assertion failures if PHI nodes are moved outside the region from entry BBs to the region. It also updates the code to use the instructionsWithoutDebug iterator. Reviewers: davidxl, davide, vsk, graham-yiu-huawei Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D54748 llvm-svn: 347683
* Add missing REQUIRES: assertsMax Kazantsev2018-11-271-0/+1
| | | | llvm-svn: 347644
* [LoopSimplifyCFG] Fix corner case with duplicating successorsMax Kazantsev2018-11-271-3/+14
| | | | | | | | | | | | It fixes a bug that doesn't update Phi inputs of the only live successor that is in the list of block's successors more than once. Thanks @uabelho for finding this. Differential Revision: https://reviews.llvm.org/D54849 Reviewed By: anna llvm-svn: 347640
* [InstCombine] add tests for rotate/bswap equality; NFCSanjay Patel2018-11-271-0/+23
| | | | llvm-svn: 347618
* [ICP] Remove incompatible attributes at indirect-call promoted callsites.Xin Tong2018-11-261-0/+32
| | | | | | | | | | | | | | Summary: Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in at least a IR verification error. Reviewers: davidxl, xur, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54913 llvm-svn: 347605
* Revert "[TTI] Reduction costs only need to include a single extract element ↵Fedor Sergeev2018-11-263-133/+282
| | | | | | | | | | cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541
* [IPSCCP] Use input operand instead of OriginalOp for ssa_copy.Florian Hahn2018-11-251-0/+50
| | | | | | | | | | | OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772. llvm-svn: 347524
OpenPOWER on IntegriCloud