summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [SLPVectorizer] Schedule bundle with different opcodes.Dinar Temirbulatov2017-08-141-52/+140
| | | | | | | | | | | | This change let us schedule a bundle with different opcodes in it, for example : [ load, add, add, add ] Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36518 llvm-svn: 310847
* [BDCE] reduce scope of an assert (PR34179)Sanjay Patel2017-08-141-10/+7
| | | | | | | | | | | | | | | | | The assert was added with r310779 and is usually correct, but as the test shows, not always. The 'volatile' on the load is needed to expose the faulty path because without it, DemandedBits would return that the load is just dead rather than not demanded, and so we wouldn't hit the bogus assert. Also, since the lambda is just a single-line now, get rid of it and inline the DB.isAllOnesValue() calls. This should fix (prevent execution of a faulty assert): https://bugs.llvm.org/show_bug.cgi?id=34179 llvm-svn: 310842
* [LoopUnroll] Enable option to peel remainder loopSam Parker2017-08-143-10/+37
| | | | | | | | | | | | | | | | | | | | On some targets, the penalty of executing runtime unrolling checks and then not the unrolled loop can be significantly detrimental to performance. This results in the need to be more conservative with the unroll count, keeping a trip count of 2 reduces the overhead as well as increasing the chance of the unrolled body being executed. But being conservative leaves performance gains on the table. This patch enables the unrolling of the remainder loop introduced by runtime unrolling. This can help reduce the overhead of misunrolled loops because the cost of non-taken branches is much less than the cost of the backedge that would normally be executed in the remainder loop. This allows larger unroll factors to be used without suffering performance loses with smaller iteration counts. Differential Revision: https://reviews.llvm.org/D36309 llvm-svn: 310824
* [InstCombine] Simplify and inline FoldOrWithConstants/FoldXorWithConstantsCraig Topper2017-08-141-85/+19
| | | | | | | | | | | | | | | | | Summary: These functions were overly complicated. The body of this function was rechecking for an And operation to find the constant, but we already knew we were looking at two Ands ORed together and the pieces are in variables. We already had earlier nearby code that checked for ConstantInts. So just inline the remaining parts into the earlier code. Next step is to use m_APInt instead of ConstantInt. Reviewers: spatel, efriedma, davide, majnemer Reviewed By: spatel Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D36439 llvm-svn: 310806
* [BDCE] clear poison generators after turning a value into zero (PR33695, ↵Sanjay Patel2017-08-121-0/+47
| | | | | | | | | | | | | | | | PR34037) nsw, nuw, and exact carry implicit assumptions about their operands, so we need to clear those after trivializing a value. We decided there was no danger for llvm.assume or metadata, so there's just a comment about that. This fixes miscompiles as shown in: https://bugs.llvm.org/show_bug.cgi?id=33695 https://bugs.llvm.org/show_bug.cgi?id=34037 Differential Revision: https://reviews.llvm.org/D36592 llvm-svn: 310779
* [OptDiag] Updating Remarks in SampleProfileEli Friedman2017-08-111-30/+45
| | | | | | | | | | | | | | | | Updating remark API to newer OptimizationDiagnosticInfo API. This allows remarks to show up in diagnostic yaml file, and enables use of opt-viewer tool. Hotness information for remarks (L505 and L751) do not display hotness information, most likely due to profile information not being propagated yet. Unsure if this is the desired outcome. Patch by Tarun Rajendran. Differential Revision: https://reviews.llvm.org/D36127 llvm-svn: 310763
* Fix typo /NFCXinliang David Li2017-08-111-1/+1
| | | | llvm-svn: 310737
* [InstCombine] Make (X|C1)^C2 -> X^(C1^C2) iff X&~C1 == 0 work for splat vectorsCraig Topper2017-08-101-23/+18
| | | | | | | | This also corrects the description to match what was actually implemented. The old comment said X^(C1|C2), but it implemented X^((C1|C2)&~(C1&C2)). I believe ((C1|C2)&~(C1&C2)) is equivalent to (C1^C2). Differential Revision: https://reviews.llvm.org/D36505 llvm-svn: 310658
* [InstCombine] Fix a crash in getSelectCondition if we happen to have two ↵Craig Topper2017-08-101-2/+3
| | | | | | | | inverse vectors of i1 constants. We used to try to truncate the constant vector to vXi1, but if it's already i1 this would fail. Instead we now use IRBuilder::getZExtOrTrunc which should check the type and only create a trunc if needed. I believe this should trigger constant folding in the IRBuilder and ultimately do the same thing just with the additional type check. llvm-svn: 310639
* [InstCombine] Add a DEBUG_COUNTER to InstCombine to limit how many ↵Craig Topper2017-08-101-0/+6
| | | | | | | | | | | | instructions are visited for debug Sometimes it would be nice to stop InstCombine mid way through its combining to see the current IR. By using a debug counter we can place an upper limit on how many instructions to process. This will also allow skipping the first X combines, but that has the potential to change later combines since earlier canonicalizations might have been skipped. Differential Revision: https://reviews.llvm.org/D36553 llvm-svn: 310638
* [DebugCounter] Move the semicolon out of the DEBUG_COUNTER macro and require ↵Craig Topper2017-08-102-4/+5
| | | | | | | | | | it to be placed at the end of each use. This make it consistent with STATISTIC which it will often appears near. While there move one DEBUG_COUNTER instance out of an anonymous namespace. It's already declaring a static variable so the namespace is unnecessary. llvm-svn: 310637
* [sanitizer-coverage] Change cmp instrumentation to distinguish const operandsAlexander Potapenko2017-08-101-4/+40
| | | | | | | | | | | | | | | | | | | | | This implementation of SanitizerCoverage instrumentation inserts different callbacks depending on constantness of operands: 1. If both operands are non-const, then a usual __sanitizer_cov_trace_cmp[1248] call is inserted. 2. If exactly one operand is const, then a __sanitizer_cov_trace_const_cmp[1248] call is inserted. The first argument of the call is always the constant one. 3. If both operands are const, then no callback is inserted. This separation comes useful in fuzzing when tasks like "find one operand of the comparison in input arguments and replace it with the other one" have to be done. The new instrumentation allows us to not waste time on searching the constant operands in the input. Patch by Victor Chibotaru. llvm-svn: 310600
* [NewGVN] Add CL option to control the generation of phi-of-ops (disable by ↵Chad Rosier2017-08-101-0/+6
| | | | | | | | default). Differential Revision: https://reviews.llvm.org/D36478539 llvm-svn: 310594
* [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc ↵Sanjay Patel2017-08-092-1/+73
| | | | | | | | | | | | | | | | | | | | | | | (PR34046) I couldn't find any smaller folds to help the cases in: https://bugs.llvm.org/show_bug.cgi?id=34046 after: rL310141 The truncated rotate-by-variable patterns elude all of the existing transforms because of multiple uses and knowledge about demanded bits and knownbits that doesn't exist without the whole pattern. So we need an unfortunately large pattern match. But by simplifying this pattern in IR, the backend is already able to generate rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic (although there is a likely extraneous 'and' of the rotate amount). Note that rotate-by-constant doesn't have this problem - smaller folds should already produce the narrow IR ops. Differential Revision: https://reviews.llvm.org/D36395 llvm-svn: 310509
* [asan] Fix instruction emission ordering with dynamic shadow.Matt Morehouse2017-08-091-3/+8
| | | | | | | | | | | | | | | | Summary: Instrumentation to copy byval arguments is now correctly inserted after the dynamic shadow base is loaded. Reviewers: vitalybuka, eugenis Reviewed By: vitalybuka Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D36533 llvm-svn: 310503
* [LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess()Jonas Paulsson2017-08-091-2/+9
| | | | | | | | | | | | | | | | | isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously only isFoldableMemAccess() could do. The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess(). The isFoldableMemAccess() hook has been removed everywhere. Review: Quentin Colombet, Ulrich Weigand https://reviews.llvm.org/D35933 llvm-svn: 310463
* [LoopStrengthReduce] Don't neglect the Fixup.Offset in isAMCompletelyFolded().Jonas Paulsson2017-08-091-2/+2
| | | | | | | | In the recursive call to isAMCompletelyFolded(), the passed offset should be the sum of F.BaseOffset and Fixup.Offset. Review: Quentin Colombet. llvm-svn: 310462
* [GlobalOpt] Switch an explicit loop to llvm::all_of(). NFCI.Davide Italiano2017-08-091-5/+2
| | | | llvm-svn: 310453
* [InstCombine] Use regular dyn_cast instead of a matcher for a simple case. NFCCraig Topper2017-08-091-2/+2
| | | | llvm-svn: 310446
* [GVN] Remove stale entries in phitranslate cache when new phi is generated ↵Wei Mi2017-08-081-0/+14
| | | | | | | | | | | | | | | | | | for PRE When a new phi is generated for scalarpre of an expression, the phiTranslate cache will become stale: Before PRE, the candidate expression must not be available in a predecessor block, and phitranslate will cache the information. After PRE, the expression will become available in all predecessor blocks, so the related entries in phiTranslate cache becomes stale. The patch will simply remove the stale entries so phiTranslate can be recomputed next time. The stale entries in phitranslate cache will not affect correctness but will cause missing PRE opportunity for later instructions. Differential Revision: https://reviews.llvm.org/D36124 llvm-svn: 310421
* Make ICP uses PSI to check for hotness.Dehao Chen2017-08-081-8/+20
| | | | | | | | | | | | | | Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted. Reviewers: davidxl, tejohnson, eraman Reviewed By: davidxl Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36341 llvm-svn: 310416
* [InstCombine] Support pulling left shifts through a subtract with constant LHSCraig Topper2017-08-081-0/+14
| | | | | | | | We already support pulling through an add with constant RHS. We can do the same for subtract. Differential Revision: https://reviews.llvm.org/D36443 llvm-svn: 310407
* [NewGVN] Use a cast instead of a dyn_cast.Chad Rosier2017-08-081-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D36478 llvm-svn: 310397
* [LoopVectorize] Fix assertion failure in Fcmp vectorizationAnna Thomas2017-08-081-1/+3
| | | | | | | | | | | | | | | | | | | | | Summary: When vectorizing fcmps we can trip on incorrect cast assertion when setting the FastMathFlags after generating the vectorized FCmp. This can happen if the FCmp can be folded to true or false directly. The fix here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating the FCmp Instruction. This is what's done by other optimizations such as InstCombine. Added a test case which trips on cast assertion without this patch. Reviewers: Ayal, mssimpso, mkuper, gilr Reviewed by: Ayal, mssimpso Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D36244 llvm-svn: 310389
* [InstCombine] Cast to BinaryOperator earlier in foldSelectIntoOp to simplify ↵Craig Topper2017-08-081-14/+10
| | | | | | | | the code. We no longer need the explicit operand count check or the later dynamic cast. llvm-svn: 310339
* [PM] Fix new LoopUnroll function pass by invalidating loop analysisChandler Carruth2017-08-081-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | results when a loop is completely removed. This is very hard to manifest as a visible bug. You need to arrange for there to be a subsequent allocation of a 'Loop' object which gets the exact same address as the one which the unroll deleted, and you need the LoopAccessAnalysis results to be significant in the way that they're stale. And you need a million other things to align. But when it does, you get a deeply mysterious crash due to actually finding a stale analysis result. This fixes the issue and tests for it by directly checking we successfully invalidate things. I have not been able to get *any* test case to reliably trigger this. Changes to LLVM itself caused the only test case I ever had to cease to crash. I've looked pretty extensively at less brittle ways of fixing this and they are actually very, very hard to do. This is a somewhat strange and unusual case as we have a pass which is deleting an IR unit, but is not running within that IR unit's pass framework (which is what handles this cleanly for the normal loop unroll). And where there isn't a definitive way to clear *all* of the stale cache entries. And where the pass *is* updating the core analysis that provides the IR units! For example, we don't have any of these problems with Function analyses because it is easy to clear out function analyses when the functions themselves may have been deleted -- we clear an entire module's worth! But that is too heavy of a hammer down here in the LoopAnalysisManager layer. A better long-term solution IMO is to require that AnalysisManager's make their keys durable to this kind of thing. Specifically, when caching an analysis for one IR unit that is conceptually "owned" by a higher level IR unit, the AnalysisManager should incorporate this into its data structures so that we can reliably clear these results without having to teach each and every pass to do so manually as we do here. But that is a change for another day as it will be a fairly invasive change to the AnalysisManager infrastructure. Until then, this fortunately seems to be quite rare. llvm-svn: 310333
* Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).Evgeny Stupachenko2017-08-071-1/+1
| | | | | | | | | | | | | | | | The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289
* Removing an unused variable that was missed with the refactoring in r310272; ↵Aaron Ballman2017-08-071-3/+0
| | | | | | NFC. llvm-svn: 310285
* [InstCombine] Support (X | C1) & C2 --> (X & C2^(C1&C2)) | (C1&C2) for ↵Craig Topper2017-08-071-15/+16
| | | | | | | | | | | | vector splats Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code. My new implementation avoids relying on that behavior so that it can be naively verified with alive. Differential Revision: https://reviews.llvm.org/D36384 llvm-svn: 310272
* [SLP] General improvements of SLP vectorization process.Alexey Bataev2017-08-071-106/+107
| | | | | | | | | | | | | | | | | | Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310260
* Revert "[SLP] General improvements of SLP vectorization process."Alexey Bataev2017-08-071-109/+106
| | | | | | This reverts commit r310255. llvm-svn: 310257
* [SLP] General improvements of SLP vectorization process.Alexey Bataev2017-08-071-106/+109
| | | | | | | | | | | | | | | Summary: Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310255
* [asan] Fix asan dynamic shadow check before copyArgsPassedByValToAllocasVitaly Buka2017-08-071-1/+1
| | | | llvm-svn: 310242
* [asan] Disable checking of arguments passed by value for ↵Vitaly Buka2017-08-071-1/+2
| | | | | | | | --asan-force-dynamic-shadow Fails with "Instruction does not dominate all uses!" llvm-svn: 310241
* [Reassociate] Use a range loop for clarity. NFCI.Davide Italiano2017-08-071-5/+6
| | | | | | | | While here, rename `i` to `Rank` as the latter is more self-explanatory (and this code also uses `I` two lines below to identify an Instruction). llvm-svn: 310238
* [Reassociate] Try to bail out early when canonicalizing.Davide Italiano2017-08-071-6/+2
| | | | | | | This commit rearranges the checks to avoid calls to getRank() when not needed (e.g. when RHS == LHS). llvm-svn: 310237
* [InstCombine] Remove shift handling from OptAndOp.Craig Topper2017-08-061-58/+0
| | | | | | | | | | | | | | Summary: This is all handled by SimplifyDemandedBits. Reviewers: spatel, davide Reviewed By: davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D36382 llvm-svn: 310234
* [InstCombine] Support (X ^ C1) & C2 --> (X & C2) ^ (C1&C2) for vector splats.Craig Topper2017-08-061-8/+10
| | | | llvm-svn: 310233
* [InstCombine] Support '(C - X) ^ signmask -> (C + signmask - X)' and '(X + ↵Craig Topper2017-08-061-16/+11
| | | | | | C) ^ signmask -> (X + C + signmask)' for vector splats. llvm-svn: 310232
* [InstCombine] Support ~(c-X) --> X+(-c-1) and ~(X-c) --> (-c-1)-X for splat ↵Craig Topper2017-08-061-14/+25
| | | | | | vectors. llvm-svn: 310195
* [InstCombine] Fold (C - X) ^ signmask -> (C + signmask - X).Craig Topper2017-08-051-6/+11
| | | | llvm-svn: 310186
* [InstCombine] Teach the code that pulls logical operators through constant ↵Craig Topper2017-08-051-3/+5
| | | | | | shifts to handle vector splats too. llvm-svn: 310185
* [InstCombine] Support vector splats in foldSelectICmpAnd.Craig Topper2017-08-051-15/+23
| | | | | | Unfortunately, it looks like there's some other missed optimizations in the generated code for some of these cases. I'll try to look at some of those next. llvm-svn: 310184
* [SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle ↵Dinar Temirbulatov2017-08-051-23/+54
| | | | | | | | different opcodes, NFCI. Differential Revision: https://reviews.llvm.org/D35769 llvm-svn: 310183
* [InstCombine] refactor trunc(binop) transforms; NFCISanjay Patel2017-08-052-40/+38
| | | | | | | In addition to moving the shift transforms over, we may want to detect too-wide rotate patterns here (PR34046). llvm-svn: 310181
* [InstCombine] In foldSelectICmpAnd, if we need to to truncate from the 'and' ↵Craig Topper2017-08-051-8/+8
| | | | | | | | | | | | type to the 'select' type, do it after shifting right instead of just bailing. Previously we were always trying to emit the zext or truncate before any shift. This meant if the 'and' mask was larger than the size of the truncate we would skip the transformation. Now we shift the result of the and right first leaving the bit within the range of the truncate. This matches what we are doing in foldSelectICmpAndOr for the same problem. llvm-svn: 310159
* [InstCombine] narrow truncated add/sub/mul with constantSanjay Patel2017-08-041-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Name: narrow_sub %sub = sub i32 C1, %x %r = trunc i32 %sub to i8 => %xn = trunc i32 %x to i8 %narrowC = trunc i32 C1 to i8 %r = sub i8 %narrowC, %xn Name: narrow_add %add = add i32 %x, C1 %r = trunc i32 %add to i8 => %xn = trunc i32 %x to i8 %narrowC = trunc i32 C1 to i8 %r = add i8 %xn, %narrowC Name: narrow_mul %mul = mul i32 %x, C1 %r = trunc i32 %mul to i8 => %xn = trunc i32 %x to i8 %narrowC = trunc i32 C1 to i8 %r = mul i8 %xn, %narrowC http://rise4fun.com/Alive/QpS This doesn't solve PR34046 (failure to recognize rotate): https://bugs.llvm.org/show_bug.cgi?id=34046 ...but it reduces an extra complication in the description examples to a form that we can more easily match. llvm-svn: 310141
* Revert r310055, it caused PR34074.Nico Weber2017-08-041-100/+3
| | | | llvm-svn: 310123
* Fix PR33514Evgeny Stupachenko2017-08-041-0/+6
| | | | | | | | | | | | | | Summary: The bug was uncovered after fix of PR23384 (part 3 of 3). The patch restricts pointer multiplication in SCEV computaion for ICmpZero. Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D36170 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310092
* [ArgPromotion] Preserve alignment of byval argument in new allocaReid Kleckner2017-08-041-1/+1
| | | | | | | | | | | The frontend may have requested a higher alignment for any reason, and downstream optimizations may already have taken advantage of it. We should keep the same alignment when moving the allocation from the parameter area to the local variable area. Fixes PR34038 llvm-svn: 310071
OpenPOWER on IntegriCloud