summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* NewGVN: Fix memory congruence verification. The return true should be a ↵Daniel Berlin2017-04-181-8/+8
| | | | | | return false. Merge the appropriate if statements so it doesn't happen again. llvm-svn: 300584
* [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.Easwaran Raman2017-04-181-3/+9
| | | | | | | | | | | | | | | | | In tryToVectorizeList, under a very limited circumstance (when entered from tryToVectorizePair), the values may be reordered (swapped) and the SLP tree is built with the new order. This extends that to the case when starting from phis in vectorizeChainsInBlock when there are exactly two phis. The textual order of phi nodes shouldn't really matter. Without this change, the loop body in the accompnaying test case is fully vectorized when we swap the orde of the phis but not with this order. While this doesn't solve the phi-ordering problem in a general way (for more than 2 phis), this is simple fix that piggybacks on an existing mechanism and is useful in cases like multiplying two complex numbers. Differential revision: https://reviews.llvm.org/D32065 llvm-svn: 300574
* [APInt] Use lshrInPlace to replace lshr where possibleCraig Topper2017-04-183-8/+9
| | | | | | | | | | This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566
* NewGVN: Don't waste time value numbering unreachable blocksDaniel Berlin2017-04-181-17/+6
| | | | llvm-svn: 300565
* LoopRerollPass: Prefer Value::hasOneUse() over Value::getNumUses(). NFC.Zvi Rackover2017-04-181-1/+1
| | | | | | getNumUses() can be more expensive as it iterates over all list's elements. llvm-svn: 300558
* [LV] Cache block mask valuesGil Rapaport2017-04-181-7/+17
| | | | | | | | | | | | This patch is part of D28975's breakdown. Add caching for block masks similar to the cache already used for edge masks, replacing generation per user with reusing the first generated value which dominates all uses. Differential Revision: https://reviews.llvm.org/D32054 llvm-svn: 300557
* [GVNHoist] Mark GlobalsAA as preserved by GVNHoist.Nikolai Bozhenov2017-04-181-0/+3
| | | | | | | | | | | | | Reviewers: sebpop, hiraditya Reviewed By: sebpop Subscribers: n.bozhenov, llvm-commits Differential Revision: https://reviews.llvm.org/D32158 Patch by Andrei Elovikov <andrei.elovikov@intel.com> llvm-svn: 300552
* [SampleProfile] Don't assert when printing the DebugLoc of a branch. NFC.Andrea Di Biagio2017-04-181-2/+4
| | | | llvm-svn: 300544
* [SampleProfile] Skip intrinsic calls when visiting callsites in ↵Andrea Di Biagio2017-04-181-1/+1
| | | | | | | | | | | | | | InlineHotFunctions. Before this patch, we always called method 'findCalleeFunctionSamples()' on intrinsic calls. However, intrinsic calls like llvm.dbg.value() are not viable candidates for obvious reasons. No functional change intended. Differential Revision: https://reviews.llvm.org/D32008 llvm-svn: 300541
* PR32382: Fix emitting complex DWARF expressions.Adrian Prantl2017-04-182-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DWARF specification knows 3 kinds of non-empty simple location descriptions: 1. Register location descriptions - describe a variable in a register - consist of only a DW_OP_reg 2. Memory location descriptions - describe the address of a variable 3. Implicit location descriptions - describe the value of a variable - end with DW_OP_stack_value & friends The existing DwarfExpression code is pretty much ignorant of these restrictions. This used to not matter because we only emitted very short expressions that we happened to get right by accident. This patch makes DwarfExpression aware of the rules defined by the DWARF standard and now chooses the right kind of location description for each expression being emitted. This would have been an NFC commit (for the existing testsuite) if not for the way that clang describes captured block variables. Based on how the previous code in LLVM emitted locations, DW_OP_deref operations that should have come at the end of the expression are put at its beginning. Fixing this means changing the semantics of DIExpression, so this patch bumps the version number of DIExpression and implements a bitcode upgrade. There are two major changes in this patch: I had to fix the semantics of dbg.declare for describing function arguments. After this patch a dbg.declare always takes the *address* of a variable as the first argument, even if the argument is not an alloca. When lowering a DBG_VALUE, the decision of whether to emit a register location description or a memory location description depends on the MachineLocation — register machine locations may get promoted to memory locations based on their DIExpression. (Future) optimization passes that want to salvage implicit debug location for variables may do so by appending a DW_OP_stack_value. For example: DBG_VALUE, [RBP-8] --> DW_OP_fbreg -8 DBG_VALUE, RAX --> DW_OP_reg0 +0 DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0 All testcases that were modified were regenerated from clang. I also added source-based testcases for each of these to the debuginfo-tests repository over the last week to make sure that no synchronized bugs slip in. The debuginfo-tests compile from source and run the debugger. https://bugs.llvm.org/show_bug.cgi?id=32382 <rdar://problem/31205000> Differential Revision: https://reviews.llvm.org/D31439 llvm-svn: 300522
* Build SymbolMap in SampleProfileLoader to help matchin function names with ↵Dehao Chen2017-04-171-1/+31
| | | | | | | | | | | | | | | | suffix. Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline. Reviewers: davidxl, dnovillo, tejohnson Reviewed By: davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D31952 llvm-svn: 300507
* [SimplifyCFG] Use hasNUses instead of comparing getNumUses to a constant."Craig Topper2017-04-171-1/+1
| | | | | | The use list is a linked list so getNumUses requires a linear scan through the whole list. hasNUses will stop scanning at N and see if that is the end. llvm-svn: 300505
* [InstCombine] Matchers work with both ConstExpr and Instructions.Davide Italiano2017-04-171-2/+2
| | | | | | | | | | So, `cast<Instruction>` is not guaranteed to succeed. Change the code so that we create a new constant and use it in the newly created instruction, as it's done in other places in InstCombine. OK'ed by Sanjay/Craig. Fixes PR32686. llvm-svn: 300495
* Bitcode: Add a string table to the bitcode format.Peter Collingbourne2017-04-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Add a top-level STRTAB block containing a string table blob, and start storing strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in the string table. This change allows us to share names between globals and comdats as well as between modules, and improves the efficiency of loading bitcode files by no longer using a bit encoding for symbol names. Once we start writing the irsymtab to the bitcode file we will also be able to share strings between it and the module. On my machine, link time for Chromium for Linux with ThinLTO decreases by about 7% for no-op incremental builds or about 1% for full builds. Total bitcode file size decreases by about 3%. As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html Differential Revision: https://reviews.llvm.org/D31838 llvm-svn: 300464
* Introduce APInt::isSignBitSet/isSignBitClear. Use in place isSignBitSet in ↵Craig Topper2017-04-171-4/+4
| | | | | | | | place of isNegative in known bits tracking. This makes statements like KnownZero.isNegative() (which means the value we're tracking is positive) less confusing. llvm-svn: 300457
* AMDGPU: SimplifyDemandedElts for image intrinsicsMatt Arsenault2017-04-171-3/+80
| | | | | | | | Causes some VGPR usage improvements in shaderdb, but introduces some SGPR spilling regressions due to random scheduling changes later. llvm-svn: 300453
* [LCSSA] Don't insert tokens into the worklist at all.Davide Italiano2017-04-171-7/+8
| | | | | | | We're gonna skip them anyway, so there's no point in inserting them in the first place. llvm-svn: 300452
* [LoopPeeling] Get rid of Phis that become invariant after N stepsMax Kazantsev2017-04-171-20/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a generalization of the improvement introduced in rL296898. Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after N iterations, we can peel N times and turn it into invariant. In order to do this, we for every Phi in loop's header we define the Invariant Depth value which is calculated as follows: Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge]. If %y is a loop invariant, then Depth(%x) = 1. If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1. Otherwise, Depth(%x) is infinite. Notice that if we peel a loop, all Phis with Depth = 1 become invariants, and all other Phis with finite depth decrease the depth by 1. Thus, peeling N first iterations allows us to turn all Phis with Depth <= N into invariants. Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31613 llvm-svn: 300446
* [LoopPeeling] Fix condition for phi-eliminating peelingMax Kazantsev2017-04-171-1/+2
| | | | | | | | | | | | | | | | | When peeling loops basing on phis becoming invariants, we make a wrong loop size check. UP.Threshold should be compared against the total numbers of instructions after the transformation, which is equal to 2 * LoopSize in case of peeling one iteration. We should also check that the maximum allowed number of peeled iterations is not zero. Reviewers: sanjoy, anna, reames, mkuper Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31753 llvm-svn: 300441
* [InstCombine] Simplify 1/X for vectors.Craig Topper2017-04-171-16/+15
| | | | llvm-svn: 300439
* [InstCombine] Add support for vector srem->urem.Craig Topper2017-04-171-7/+5
| | | | llvm-svn: 300437
* [InstCombine] Add support for turning vector sdiv into udiv.Craig Topper2017-04-171-18/+16
| | | | llvm-svn: 300435
* [LCSSA] Simplify a loop. NFCI.Davide Italiano2017-04-171-7/+3
| | | | llvm-svn: 300433
* [InstCombine][ValueTracking] When computing known bits for Srem make sure we ↵Craig Topper2017-04-161-2/+2
| | | | | | | | don't compute known bits for the LHS twice. If we already called computeKnownBits for the RHS being a constant power of 2, we've already computed everything we can and should just stop. I think previously we would still recurse if we had determined the result was negative or had not determined the sign bit at all. llvm-svn: 300432
* [LCSSA] Fix non-determinism due to iterating over a SmallPtrSet.Davide Italiano2017-04-161-3/+3
| | | | | | Use a SmallSetVector instead. llvm-svn: 300431
* [InstCombine] In SimplifyDemandedUseBits, don't bother to mask known bits of ↵Craig Topper2017-04-161-3/+3
| | | | | | | | constants with DemandedMask. Just because we didn't demand them doesn't mean they aren't known. llvm-svn: 300430
* [X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimizationMichael Zuckerman2017-04-161-0/+31
| | | | | | | | | This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b)) to instCombineCall pass and was written specific for X86 CMP intrinsics. Differential Revision: https://reviews.llvm.org/D31398 llvm-svn: 300422
* [InstCombine] allow (X != C1 && X != C2) and similar patterns to match splat ↵Sanjay Patel2017-04-151-19/+19
| | | | | | vector constants llvm-svn: 300402
* [ProfileData] Unify getInstrProf*SectionName helpersVedant Kumar2017-04-152-31/+13
| | | | | | | | | | | | | | | | | | | | | | This is a version of D32090 that unifies all of the `getInstrProf*SectionName` helper functions. (Note: the build failures which D32090 would have addressed were fixed with r300352.) We should unify these helper functions because they are hard to use in their current form. E.g we recently introduced more helpers to fix section naming for COFF files. This scheme doesn't totally succeed at hiding low-level details about section naming, so we should switch to an API that is easier to maintain. This is not an NFC commit because it fixes llvm-cov's testing support for COFF files (this falls out of the API change naturally). This is an area where we lack tests -- I will see about adding one as a follow up. Testing: check-clang, check-profile, check-llvm. Differential Revision: https://reviews.llvm.org/D32097 llvm-svn: 300381
* [InstCombine] MakeAnd/Or/Xor handling to reuse previous APInt computationsCraig Topper2017-04-141-36/+46
| | | | | | | | | | | | When checking if we should return a constant, we create some temporary APInts to see if we know all bits. But the exact computations we do are needed in several other locations in the same code. This patch moves them to named temporaries so we can reuse them. Ideally we'd write directly to KnownZero/One, but we currently seem to only write those variables after all the simplifications checks and I didn't want to change that with this patch. Differential Revision: https://reviews.llvm.org/D32094 llvm-svn: 300376
* [IR] Make paramHasAttr to use arg indices instead of attr indicesReid Kleckner2017-04-144-6/+6
| | | | | | | | | This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367
* [InstCombine] (X != C1 && X != C2) --> (X | (C1 ^ C2)) != C2Sanjay Patel2017-04-141-36/+65
| | | | | | | | | | ...when C1 differs from C2 by one bit and C1 <u C2: http://rise4fun.com/Alive/Vuo And move related folds to a helper function. This reduces code duplication and will make it easier to remove the scalar-only restriction as a follow-up step. llvm-svn: 300364
* [InstCombine] Support folding a subtract with a constant LHS into a phi nodeCraig Topper2017-04-148-28/+44
| | | | | | | | | | | | We currently only support folding a subtract into a select but not a PHI. This fixes that. I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects. Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards. Differential Revision: https://reviews.llvm.org/D31686 llvm-svn: 300363
* [InstCombine] Refactor SimplifyUsingDistributiveLaws to more explicitly skip ↵Craig Topper2017-04-141-30/+33
| | | | | | | | | | | | | | code when LHS/RHS aren't BinaryOperators Currently this code always makes 2 or 3 calls to tryFactorization regardless of whether the LHS/RHS are BinaryOperators. We make 3 calls when both operands are BinaryOperators with the same opcode. Or surprisingly, when neither are BinaryOperators. This is because getBinOpsForFactorization returns Instruction::BinaryOpsEnd when the operand is not a BinaryOperator. If both LHS and RHS are not BinaryOperators then they both have an Opcode of Instruction::BinaryOpsEnd. When this happens we rely on tryFactorization to early out due to A/B/C/D being null. Similar behavior occurs for the other calls, we rely on getBinOpsForFactorization having made A/B or C/D null to get tryFactorization to early out. We also rely on these null checks to check the result of getIdentityValue and early out for it. This patches refactors this to pull these checks up to SimplifyUsingDistributiveLaws so we don't rely on BinaryOpsEnd as a sentinel or this A/B/C/D null behavior. I think this makes this code easier to reason about. Should also give a tiny performance improvement for cases where the LHS or RHS isn't a BinaryOperator. Differential Revision: https://reviews.llvm.org/D31913 llvm-svn: 300353
* [FunctionImport] assert(false) -> llvm_unreachable(). NFCI.Davide Italiano2017-04-141-1/+1
| | | | llvm-svn: 300344
* Tighten the API for ScalarEvolutionNormalizationSanjoy Das2017-04-141-4/+3
| | | | llvm-svn: 300331
* Remove NormalizeAutodetect; NFCSanjoy Das2017-04-141-9/+3
| | | | | | | | | It is cleaner to have a callback based system where the logic of whether an add recurrence is normalized or not lives on IVUsers. This is one step in a multi-step cleanup. llvm-svn: 300330
* [LV] Remove implicit single basic block assumptionGil Rapaport2017-04-141-6/+5
| | | | | | | | | | | | | This patch is part of D28975's breakdown - no change in output intended. LV's code currently assumes the vectorized loop is a single basic block up until predicateInstructions() is called. This patch removes two manifestations of this assumption (loop phi incoming values, dominator tree update) by replacing the use of vectorLoopBody with the vectorized loop's latch/header. Differential Revision: https://reviews.llvm.org/D32040 llvm-svn: 300310
* [InstCombine] Use APInt::setSignBit and APInt::isNegative(). NFCCraig Topper2017-04-141-3/+3
| | | | llvm-svn: 300305
* Fix test failure on windows: pass module to getInstrProfXXName callsXinliang David Li2017-04-141-4/+4
| | | | llvm-svn: 300302
* NewGVN: Don't propagate over phi backedges where undef causes us toDaniel Berlin2017-04-141-8/+149
| | | | | | | | have >1 value, unless we can prove the phi node is cycle free. Fixes PR 32607. llvm-svn: 300299
* [Profile] PE binary coverage bug fixXinliang David Li2017-04-132-12/+11
| | | | | | | | PR/32584 Differential Revision: https://reviews.llvm.org/D32023 llvm-svn: 300277
* [IR] Make getParamAttributes take argument numbers, not ArgNo+1Reid Kleckner2017-04-135-35/+34
| | | | | | | | | | | | Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1, Kind) everywhere. The fact that the AttributeList index for an argument is ArgNo+1 should be a hidden implementation detail. NFC llvm-svn: 300272
* [InstCombine] Use APInt::getBitsSetFrom instead of inverting the result of ↵Craig Topper2017-04-131-4/+2
| | | | | | getLowBitsSet. NFC llvm-svn: 300265
* [LCSSA] Efficiently compute blocks dominating at least one exit.Davide Italiano2017-04-131-19/+54
| | | | | | | | | | | | | | | | | | | | | | | For LCSSA purposes, loop BBs not dominating any of the exits aren't interesting, as none of the values defined in these blocks can be used outside the loop. The way the code computed this information was by comparing each BB of the loop with each of the exit blocks and ask the dominator tree about their dominance relation. This is slow. A more efficient way, implemented here, is that of starting from the exit blocks and walking the dom upwards until we hit an header. By transitivity, all the blocks we encounter in our path dominate an exit. For the testcase provided in PR31851, this reduces compile time on `opt -O2` by ~25%, going from 1m47s to 1m22s. Thanks to Dan/MichaelZ for discussions/suggesting the approach/review. Differential Revision: https://reviews.llvm.org/D31843 llvm-svn: 300255
* Revert accidentally-committed files in r300252.Richard Smith2017-04-131-403/+0
| | | | llvm-svn: 300253
* Remove all allocation and divisions from GreatestCommonDivisorRichard Smith2017-04-131-0/+403
| | | | | | | | | | | Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This avoids the (expensive) APInt division operation in favour of bit operations. Remove all memory allocation from within the GCD loop by tweaking our `lshr` implementation so it can operate in-place. Differential Revision: https://reviews.llvm.org/D31968 llvm-svn: 300252
* [InstCombine] Fix !prof metadata preservation for invokesReid Kleckner2017-04-131-18/+16
| | | | | | | | | | | | | | | | | | | | Summary: Bug noticed by inspection. Extend the test to handle invokes as well as calls, and rewrite it to not depend on the inliner and other passes. Also simplify the call site replacement code with CallSite, similar to what I did to dead arg elimination and arg promotion (rL300235 and rL300229). Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32041 llvm-svn: 300251
* [LCSSA] Assert that we always have a valid loop.Davide Italiano2017-04-131-0/+1
| | | | | | | We could otherwise add BBs not belonging to a loop in `formLCSSA` and later crash when trying to iterate the loop blocks. llvm-svn: 300244
* [LCSSA] Remove spurious whitespaces. NFCI.Davide Italiano2017-04-131-1/+1
| | | | llvm-svn: 300243
OpenPOWER on IntegriCloud