summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Add support for TFE/LWE in image intrinsicsDavid Stuttard2018-11-292-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
* [CVP] tidy processCmp(); NFCSanjay Patel2018-11-291-14/+14
| | | | | | | | 1. The variables were confusing: 'C' typically refers to a constant, but here it was the Cmp. 2. Formatting violations. 3. Simplify code to return true/false constant. llvm-svn: 347868
* Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply ↵Martin Storsjo2018-11-291-312/+15
| | | | | | | | | | | r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details. llvm-svn: 347867
* Disable TermFolding in LoopSimplifyCFG until PR39783 is fixedMax Kazantsev2018-11-291-1/+1
| | | | llvm-svn: 347844
* [LoopStrengthReduce] ComplexityLimit as an optionSam Parker2018-11-291-3/+5
| | | | | | | | Convert ComplexityLimit into a command line value. Differential Revision: https://reviews.llvm.org/D54899 llvm-svn: 347843
* [DebugInfo] Give inlinable calls DILocs (PR39807)Jeremy Morse2018-11-281-8/+9
| | | | | | | | | | | | | | | | | | | | In PR39807 we incorrectly handle circumstances where calls are common'd from conditional blocks into the parent BB. Calls that can be inlined must always have DebugLocs, however we strip them during commoning, which the IR verifier asserts on. Fix this by using applyMergedLocation: it will perform the same DebugLoc stripping of conditional Locs, but will also generate an unknown location DebugLoc that satisfies the requirement for inlinable calls to always have locations. Some of the prior logic for selecting a DebugLoc is now likely redundant; I'll generate a follow-up to remove it (involves editing more regression tests). Differential Revision: https://reviews.llvm.org/D54997 llvm-svn: 347782
* [LICM] Enable control flow hoisting by defaultJohn Brawn2018-11-281-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D54949 llvm-svn: 347778
* [LICM] Reapply r347190 "Make LICM able to hoist phis" with fixJohn Brawn2018-11-281-15/+312
| | | | | | | | | | | | | | | This commit caused failures because it failed to correctly handle cases where we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We need to make sure that we rehoist the use to _after_ the hoisted phi, which we do by always rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader. An option is also added to control whether control flow is hoisted, which is off in this commit but will be turned on in a subsequent commit. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347776
* [InstCombine] Combine saturating add/sub with constant operandsNikita Popov2018-11-281-0/+34
| | | | | | | | | | | | | | | | | Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773
* [InstCombine] Canonicalize ssub.sat to sadd.satNikita Popov2018-11-281-0/+11
| | | | | | | | | Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772
* [InstCombine] Use known overflow information for saturating add/subNikita Popov2018-11-281-0/+38
| | | | | | | | | | | | | If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347770
* [InstCombine] Canonicalize const arg for saturating addsNikita Popov2018-11-281-0/+6
| | | | | | | | | If a saturating add intrinsic has one constant argument, make sure it is on the RHS. This will simplify further transformations. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347769
* [ThinLTO] Correct linkonce_any function import linkage. NFC.Xin Tong2018-11-281-5/+6
| | | | | | | | | | | | | | Summary: This is a NFC as we do not import non-odr vague linkage when computing for import list for a module. Reviewers: tejohnson, pcc Subscribers: inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54928 llvm-svn: 347763
* [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.Alexey Bataev2018-11-281-5/+9
| | | | | | | | | | | | | | | | | | | Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759
* [PartialInliner] Make PHIs free in cost computation.Florian Hahn2018-11-271-10/+9
| | | | | | | | | | | | | | | | InlineCost also treats them as free and the current implementation can cause assertion failures if PHI nodes are moved outside the region from entry BBs to the region. It also updates the code to use the instructionsWithoutDebug iterator. Reviewers: davidxl, davide, vsk, graham-yiu-huawei Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D54748 llvm-svn: 347683
* InstCombine: add comment explaining malloc deletion. NFC.Tim Northover2018-11-271-3/+10
| | | | | | | | I tried to change this, not quite realising the logic behind what we were doing. Hopefully this comment will help the next person to come along. llvm-svn: 347653
* [LoopSimplifyCFG] Turn on term folding after underlying bug fixedMax Kazantsev2018-11-271-1/+1
| | | | llvm-svn: 347641
* [LoopSimplifyCFG] Fix corner case with duplicating successorsMax Kazantsev2018-11-271-1/+11
| | | | | | | | | | | | It fixes a bug that doesn't update Phi inputs of the only live successor that is in the list of block's successors more than once. Thanks @uabelho for finding this. Differential Revision: https://reviews.llvm.org/D54849 Reviewed By: anna llvm-svn: 347640
* [ICP] Remove incompatible attributes at indirect-call promoted callsites.Xin Tong2018-11-261-2/+27
| | | | | | | | | | | | | | Summary: Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in at least a IR verification error. Reviewers: davidxl, xur, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54913 llvm-svn: 347605
* [InstCombine] add helper function to reduce code duplication; NFCSanjay Patel2018-11-261-24/+19
| | | | llvm-svn: 347604
* [IPSCCP] Use input operand instead of OriginalOp for ssa_copy.Florian Hahn2018-11-251-5/+5
| | | | | | | | | | | OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772. llvm-svn: 347524
* [InstCombine] Determine demanded and known bits for funnel shiftsNikita Popov2018-11-241-0/+24
| | | | | | | | | | | | | | Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 llvm-svn: 347515
* [InstCombine] Simplify funnel shift with zero/undef operand to shiftNikita Popov2018-11-231-0/+23
| | | | | | | | | | | | | | | | | | | | The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision: https://reviews.llvm.org/D54778 llvm-svn: 347505
* Disable LoopSimplifyCFG terminator folding by defaultMax Kazantsev2018-11-231-0/+6
| | | | llvm-svn: 347486
* [LoopSimplifyCFG] Don't delete LCSSA PhisMax Kazantsev2018-11-231-1/+4
| | | | | | | | | | | | When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision: https://reviews.llvm.org/D54841 llvm-svn: 347484
* [NFC] Assert that all blocks staying in loop are liveMax Kazantsev2018-11-221-0/+2
| | | | llvm-svn: 347458
* [NFC] Ensure deterministic order of dead exit blocksMax Kazantsev2018-11-221-6/+11
| | | | llvm-svn: 347457
* [NFC] Simplify code by using standard exit blocks collectionMax Kazantsev2018-11-221-10/+8
| | | | llvm-svn: 347454
* [PM] correcting return value for new-pass-manager version of ScalarizerFedor Sergeev2018-11-211-2/+2
| | | | | | Obvious mistake missed during D54695 review. llvm-svn: 347432
* [MergeFuncs] Generate alias instead of thunk if possibleNikita Popov2018-11-211-14/+73
| | | | | | | | | | | | | | | | | | | The MergeFunctions pass was originally intended to emit aliases instead of thunks where possible (unnamed_addr). However, for a long time this functionality was behind a flag hardcoded to false, bitrotted and was eventually removed in r309313. Originally the functionality was first disabled in r108417 due to lack of support for aliases in Mach-O. I believe that this is no longer the case nowadays, but not really familiar with this area. In the interest of being conservative, this patch reintroduces the aliasing functionality behind a default disabled -mergefunc-use-aliases flag. Differential Revision: https://reviews.llvm.org/D53285 llvm-svn: 347407
* [PM] Port Scalarizer to the new pass manager.Mikael Holmen2018-11-212-55/+72
| | | | | | | | | | | | | | Patch by: markus (Markus Lavin) Reviewers: chandlerc, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: llvm-commits, Ka-Ka, bjope Differential Revision: https://reviews.llvm.org/D54695 llvm-svn: 347392
* [LoopSink] Add preheader to alias setGuozhi Wei2018-11-201-0/+1
| | | | | | | | | | This patch fixes PR39695. The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large. Differential Revision: https://reviews.llvm.org/D54659 llvm-svn: 347325
* Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches ↵Max Kazantsev2018-11-201-0/+315
| | | | | | | | | | | and switches" The initial version of patch lacked Phi nodes updates in destinations of removed edges. This version contains this update and tests on this situation. Differential Revision: https://reviews.llvm.org/D54021 llvm-svn: 347289
* [Transforms] Prefer static and avoid namespaces, NFCReid Kleckner2018-11-191-10/+6
| | | | | | | | | | | | | Put 'static' on three functions in an anonymous namespace as per our coding style. Remove the 'namespace llvm {}' around the .cpp file and explicitly declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'. I prefer this style for free functions because the compiler will error out if the .h and .cpp files don't agree on the function name or prototype. llvm-svn: 347269
* Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches ↵Benjamin Kramer2018-11-191-313/+0
| | | | | | | | and switches" This reverts commits r347183 & r347184. Crashes while building libxml. llvm-svn: 347260
* [InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phiVedant Kumar2018-11-191-2/+6
| | | | | | | | | Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi improves backtrace quality. Fixes llvm.org/PR38083. llvm-svn: 347257
* [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlockVedant Kumar2018-11-195-10/+8
| | | | | | | | | | | | | | | | | | | | | | | Add methods to BasicBlock which make it easier to efficiently check whether a block has N (or more) predecessors. This can be more efficient than using pred_size(), which is a linear time operation. We might consider adding similar methods for successors. I haven't done so in this patch because succ_size() is already O(1). With this patch applied, I measured a 0.065% compile-time reduction in user time for running `opt -O3` on the sqlite3 amalgamation (30 trials). The change in mergeStoreIntoSuccessor alone saves 45 million linked list iterations in a stage2 Release build of llc. See llvm.org/PR39702 for a harder but more general way of achieving similar results. Differential Revision: https://reviews.llvm.org/D54686 llvm-svn: 347256
* Revert "[LICM] Make LICM able to hoist phis"Benjamin Kramer2018-11-191-302/+15
| | | | | | This reverts commit r347190. llvm-svn: 347225
* [LV] Avoid vectorizing unsafe dependencies in uniform addressAnna Thomas2018-11-191-4/+3
| | | | | | | | | | | | | | | | | | | Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220
* [LICM] Make LICM able to hoist phisJohn Brawn2018-11-191-15/+302
| | | | | | | | | | | | | | | The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that. This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347190
* [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switchesMax Kazantsev2018-11-191-0/+313
| | | | | | | | | | | | | | | | This patch introduces infrastructure and the simplest case for constant-folding of branch and switch instructions within loop into unconditional branches. It is useful as a cleanup for such passes as loop unswitching that sometimes produce such branches. Only the simplest case supported in this patch: after the folding, no block should become dead or stop being part of the loop. Support for more sophisticated cases will go separately in follow-up patches. Differential Revision: https://reviews.llvm.org/D54021 Reviewed By: anna llvm-svn: 347183
* [ProfileSummary] Standardize methods and fix commentVedant Kumar2018-11-196-8/+8
| | | | | | | | | | | | | | | | | | | | | Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182
* [CorrelatedValuePropagation] Preserve debug locations (PR38178)Vedant Kumar2018-11-181-15/+16
| | | | | | | | | Fix all of the missing debug location errors in CVP found by debugify. This includes the missing-location-after-udiv-truncation case described in llvm.org/PR38178. llvm-svn: 347147
* Use llvm::copy. NFCFangrui Song2018-11-173-6/+5
| | | | llvm-svn: 347126
* [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch withFedor Sergeev2018-11-161-2/+116
| | | | | | | | | | | | | | | | | | | | | We need to control exponential behavior of loop-unswitch so we do not get run-away compilation. Suggested solution is to introduce a multiplier for an unswitch cost that makes cost prohibitive as soon as there are too many candidates and too many sibling loops (meaning we have already started duplicating loops by unswitching). It does solve the currently known problem with compile-time degradation (PR 39544). Tests are built on top of a recently implemented CHECK-COUNT-<num> FileCheck directives. Reviewed By: chandlerc, mkazantsev Differential Revision: https://reviews.llvm.org/D54223 llvm-svn: 347097
* GlobalDCE: Teach isEmptyFunction() to ignore debug intrinsics.Adrian Prantl2018-11-161-5/+10
| | | | | | | This fixes PR39669. https://bugs.llvm.org/show_bug.cgi?id=39669 llvm-svn: 347065
* [ThinLTO] Internalize readonly globalsEugene Leviant2018-11-162-6/+54
| | | | | | | | An attempt to recommit r346584 after failure on OSX build bot. Fixed cache key computation in ThinLTOCodeGenerator and added test case llvm-svn: 347033
* [LTO] Load sample profile in LTO link step.Xin Tong2018-11-151-0/+6
| | | | | | | | | | | | | | Summary: Load sample profile in LTO link step. ThinLTO calls populateModulePassManager to load the profile Reviewers: tejohnson, davidxl, danielcdh Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54564 llvm-svn: 346971
* [InstCombine] fix rotate narrowing bug for non-pow-2 typesSanjay Patel2018-11-151-2/+7
| | | | llvm-svn: 346968
* [InstCombine] Remove a couple of asserts based on incorrect assumptionsMandeep Singh Grang2018-11-141-11/+4
| | | | | | | | | | | | | | | | Summary: These asserts are based on the assumption that the order of true/false operands in a select and those in the compare would always be the same. This fixes PR39595. Reviewers: craig.topper, spatel, dmgreen Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54359 llvm-svn: 346874
OpenPOWER on IntegriCloud