summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Cast to BinaryOperator earlier in foldSelectIntoOp to simplify ↵Craig Topper2017-08-081-14/+10
| | | | | | | | the code. We no longer need the explicit operand count check or the later dynamic cast. llvm-svn: 310339
* [PM] Fix new LoopUnroll function pass by invalidating loop analysisChandler Carruth2017-08-081-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | results when a loop is completely removed. This is very hard to manifest as a visible bug. You need to arrange for there to be a subsequent allocation of a 'Loop' object which gets the exact same address as the one which the unroll deleted, and you need the LoopAccessAnalysis results to be significant in the way that they're stale. And you need a million other things to align. But when it does, you get a deeply mysterious crash due to actually finding a stale analysis result. This fixes the issue and tests for it by directly checking we successfully invalidate things. I have not been able to get *any* test case to reliably trigger this. Changes to LLVM itself caused the only test case I ever had to cease to crash. I've looked pretty extensively at less brittle ways of fixing this and they are actually very, very hard to do. This is a somewhat strange and unusual case as we have a pass which is deleting an IR unit, but is not running within that IR unit's pass framework (which is what handles this cleanly for the normal loop unroll). And where there isn't a definitive way to clear *all* of the stale cache entries. And where the pass *is* updating the core analysis that provides the IR units! For example, we don't have any of these problems with Function analyses because it is easy to clear out function analyses when the functions themselves may have been deleted -- we clear an entire module's worth! But that is too heavy of a hammer down here in the LoopAnalysisManager layer. A better long-term solution IMO is to require that AnalysisManager's make their keys durable to this kind of thing. Specifically, when caching an analysis for one IR unit that is conceptually "owned" by a higher level IR unit, the AnalysisManager should incorporate this into its data structures so that we can reliably clear these results without having to teach each and every pass to do so manually as we do here. But that is a change for another day as it will be a fairly invasive change to the AnalysisManager infrastructure. Until then, this fortunately seems to be quite rare. llvm-svn: 310333
* Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).Evgeny Stupachenko2017-08-071-1/+1
| | | | | | | | | | | | | | | | The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289
* Removing an unused variable that was missed with the refactoring in r310272; ↵Aaron Ballman2017-08-071-3/+0
| | | | | | NFC. llvm-svn: 310285
* [InstCombine] Support (X | C1) & C2 --> (X & C2^(C1&C2)) | (C1&C2) for ↵Craig Topper2017-08-071-15/+16
| | | | | | | | | | | | vector splats Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code. My new implementation avoids relying on that behavior so that it can be naively verified with alive. Differential Revision: https://reviews.llvm.org/D36384 llvm-svn: 310272
* [SLP] General improvements of SLP vectorization process.Alexey Bataev2017-08-071-106/+107
| | | | | | | | | | | | | | | | | | Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310260
* Revert "[SLP] General improvements of SLP vectorization process."Alexey Bataev2017-08-071-109/+106
| | | | | | This reverts commit r310255. llvm-svn: 310257
* [SLP] General improvements of SLP vectorization process.Alexey Bataev2017-08-071-106/+109
| | | | | | | | | | | | | | | Summary: Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310255
* [asan] Fix asan dynamic shadow check before copyArgsPassedByValToAllocasVitaly Buka2017-08-071-1/+1
| | | | llvm-svn: 310242
* [asan] Disable checking of arguments passed by value for ↵Vitaly Buka2017-08-071-1/+2
| | | | | | | | --asan-force-dynamic-shadow Fails with "Instruction does not dominate all uses!" llvm-svn: 310241
* [Reassociate] Use a range loop for clarity. NFCI.Davide Italiano2017-08-071-5/+6
| | | | | | | | While here, rename `i` to `Rank` as the latter is more self-explanatory (and this code also uses `I` two lines below to identify an Instruction). llvm-svn: 310238
* [Reassociate] Try to bail out early when canonicalizing.Davide Italiano2017-08-071-6/+2
| | | | | | | This commit rearranges the checks to avoid calls to getRank() when not needed (e.g. when RHS == LHS). llvm-svn: 310237
* [InstCombine] Remove shift handling from OptAndOp.Craig Topper2017-08-061-58/+0
| | | | | | | | | | | | | | Summary: This is all handled by SimplifyDemandedBits. Reviewers: spatel, davide Reviewed By: davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D36382 llvm-svn: 310234
* [InstCombine] Support (X ^ C1) & C2 --> (X & C2) ^ (C1&C2) for vector splats.Craig Topper2017-08-061-8/+10
| | | | llvm-svn: 310233
* [InstCombine] Support '(C - X) ^ signmask -> (C + signmask - X)' and '(X + ↵Craig Topper2017-08-061-16/+11
| | | | | | C) ^ signmask -> (X + C + signmask)' for vector splats. llvm-svn: 310232
* [InstCombine] Support ~(c-X) --> X+(-c-1) and ~(X-c) --> (-c-1)-X for splat ↵Craig Topper2017-08-061-14/+25
| | | | | | vectors. llvm-svn: 310195
* [InstCombine] Fold (C - X) ^ signmask -> (C + signmask - X).Craig Topper2017-08-051-6/+11
| | | | llvm-svn: 310186
* [InstCombine] Teach the code that pulls logical operators through constant ↵Craig Topper2017-08-051-3/+5
| | | | | | shifts to handle vector splats too. llvm-svn: 310185
* [InstCombine] Support vector splats in foldSelectICmpAnd.Craig Topper2017-08-051-15/+23
| | | | | | Unfortunately, it looks like there's some other missed optimizations in the generated code for some of these cases. I'll try to look at some of those next. llvm-svn: 310184
* [SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle ↵Dinar Temirbulatov2017-08-051-23/+54
| | | | | | | | different opcodes, NFCI. Differential Revision: https://reviews.llvm.org/D35769 llvm-svn: 310183
* [InstCombine] refactor trunc(binop) transforms; NFCISanjay Patel2017-08-052-40/+38
| | | | | | | In addition to moving the shift transforms over, we may want to detect too-wide rotate patterns here (PR34046). llvm-svn: 310181
* [InstCombine] In foldSelectICmpAnd, if we need to to truncate from the 'and' ↵Craig Topper2017-08-051-8/+8
| | | | | | | | | | | | type to the 'select' type, do it after shifting right instead of just bailing. Previously we were always trying to emit the zext or truncate before any shift. This meant if the 'and' mask was larger than the size of the truncate we would skip the transformation. Now we shift the result of the and right first leaving the bit within the range of the truncate. This matches what we are doing in foldSelectICmpAndOr for the same problem. llvm-svn: 310159
* [InstCombine] narrow truncated add/sub/mul with constantSanjay Patel2017-08-041-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Name: narrow_sub %sub = sub i32 C1, %x %r = trunc i32 %sub to i8 => %xn = trunc i32 %x to i8 %narrowC = trunc i32 C1 to i8 %r = sub i8 %narrowC, %xn Name: narrow_add %add = add i32 %x, C1 %r = trunc i32 %add to i8 => %xn = trunc i32 %x to i8 %narrowC = trunc i32 C1 to i8 %r = add i8 %xn, %narrowC Name: narrow_mul %mul = mul i32 %x, C1 %r = trunc i32 %mul to i8 => %xn = trunc i32 %x to i8 %narrowC = trunc i32 C1 to i8 %r = mul i8 %xn, %narrowC http://rise4fun.com/Alive/QpS This doesn't solve PR34046 (failure to recognize rotate): https://bugs.llvm.org/show_bug.cgi?id=34046 ...but it reduces an extra complication in the description examples to a form that we can more easily match. llvm-svn: 310141
* Revert r310055, it caused PR34074.Nico Weber2017-08-041-100/+3
| | | | llvm-svn: 310123
* Fix PR33514Evgeny Stupachenko2017-08-041-0/+6
| | | | | | | | | | | | | | Summary: The bug was uncovered after fix of PR23384 (part 3 of 3). The patch restricts pointer multiplication in SCEV computaion for ICmpZero. Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D36170 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310092
* [ArgPromotion] Preserve alignment of byval argument in new allocaReid Kleckner2017-08-041-1/+1
| | | | | | | | | | | The frontend may have requested a higher alignment for any reason, and downstream optimizations may already have taken advantage of it. We should keep the same alignment when moving the allocation from the parameter area to the local variable area. Fixes PR34038 llvm-svn: 310071
* [InstCombine] Fold single-use variable into assert.Benjamin Kramer2017-08-041-2/+2
| | | | | | Avoids unused variable warnings in Release builds. No functional change. llvm-svn: 310064
* [InstCombine] Remove the (not (sext)) case from foldBoolSextMaskToSelect and ↵Craig Topper2017-08-041-27/+8
| | | | | | | | | | | | | | | | | | | inline the remaining code to match visitOr Summary: The (not (sext)) case is really (xor (sext), -1) which should have been simplified to (sext (xor, 1)) before we got here. So we shouldn't need to handle it. With that taken care of we only need to two cases so don't need the swap anymore. This makes us in sync with the equivalent code in visitOr so inline this to match. Reviewers: spatel, eli.friedman, majnemer Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36240 llvm-svn: 310063
* [InstCombine] Use ConstantInt::getFalse to reduce some code. NFCCraig Topper2017-08-041-2/+1
| | | | llvm-svn: 310062
* [InstCombine] narrow lshr with constantSanjay Patel2017-08-041-0/+9
| | | | | | | | | | | | | | | | | | | Name: narrow_shift Pre: C1 < 8 %zx = zext i8 %x to i32 %l = lshr i32 %zx, C1 => %narrowC = trunc i32 C1 to i8 %ns = lshr i8 %x, %narrowC %l = zext i8 %ns to i32 http://rise4fun.com/Alive/jIV This isn't directly applicable to PR34046 as written, but we need to have more narrowing folds like this to be sure that rotate patterns are recognized. llvm-svn: 310060
* [DSE] Merge stores when the later store only writes to memory locations the ↵Filipe Cabecinhas2017-08-041-3/+100
| | | | | | | | | | | | | | | | | | | | | | early store also wrote to. Summary: This fixes PR31777. If both stores' values are ConstantInt, we merge the two stores (shifting the smaller store appropriately) and replace the earlier (and larger) store with an updated constant. In the future we should also support vectors of integers. And maybe float/double if we can. Reviewers: hfinkel, junbuml, jfb, RKSimon, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30703 llvm-svn: 310055
* [InstCombine] Canonicalize clamp of float types to minmax in fast mode.Nikolai Bozhenov2017-08-041-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 310054
* Do not declare a variable which is used only in assert. NFCMax Kazantsev2017-08-041-2/+1
| | | | llvm-svn: 310034
* [IRCE] Handle loops with step different from 1/-1Max Kazantsev2017-08-041-65/+102
| | | | | | | | This patch generalizes IRCE to handle IV steps that are not equal to 1 or -1. Differential Revision: https://reviews.llvm.org/D35539 llvm-svn: 310032
* [IRCE] Recognize loops with unsigned latch conditionsMax Kazantsev2017-08-041-50/+109
| | | | | | | | This patch enables recognition of loops with ult/ugt latch conditions. Differential Revision: https://reviews.llvm.org/D35302 llvm-svn: 310027
* [InstCombine] Move the call to foldSelectICmpAnd into ↵Craig Topper2017-08-041-80/+80
| | | | | | foldSelectInstWithICmp. NFCI llvm-svn: 310025
* [InstCombine] Remove unnecessary casts. NFCCraig Topper2017-08-041-2/+2
| | | | | | We're calling an overload of getOpcode that already returns Instruction::CastOps. llvm-svn: 310024
* Un-revert r310014: false revert, it wasn't the cause of build breakVictor Leschuk2017-08-041-3/+32
| | | | llvm-svn: 310021
* Revert r310014 as it breaks build lld-x86_64-darwin13Victor Leschuk2017-08-041-32/+3
| | | | llvm-svn: 310020
* Teach GlobalSRA to update the debug info for split-up globals.Adrian Prantl2017-08-041-3/+32
| | | | | | | | | This is similar to what we are doing in "regular" SROA and creates DW_OP_LLVM_fragment operations to describe the resulting variables. rdar://problem/33654891 llvm-svn: 310014
* Use profile summary to disable peeling for huge working setsTeresa Johnson2017-08-031-6/+18
| | | | | | | | | | | | | | | | | | | | | Summary: Detect when the working set size of a profiled application is huge, by comparing the number of counts required to reach the hot percentile in the profile summary to a large threshold*. When the working set size is determined to be huge, disable peeling to avoid bloating the working set further. *Note that the selected threshold (15K) is significantly larger than the largest working set value in SPEC cpu2006 (which is gcc at around 11K). Reviewers: davidxl Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36288 llvm-svn: 310005
* [NewGVN] Fix the case where we have a phi-of-ops which goes away.Davide Italiano2017-08-031-6/+27
| | | | | | Patch by Daniel Berlin, fixes PR33196 (and probably something else). llvm-svn: 309988
* Disable loop peeling during full unrolling pass.Teresa Johnson2017-08-031-20/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Peeling should not occur during the full unrolling invocation early in the pipeline, but rather later with partial and runtime loop unrolling. The later loop unrolling invocation will also eventually utilize profile summary and branch frequency information, which we would like to use to control peeling. And for ThinLTO we want to delay peeling until the backend (post thin link) phase, just as we do for most types of unrolling. Ensure peeling doesn't occur during the full unrolling invocation by adding a parameter to the shared implementation function, similar to the way partial and runtime loop unrolling are disabled. Performance results for ThinLTO suggest this has a neutral to positive effect on some internal benchmarks. Reviewers: chandlerc, davidxl Subscribers: mzolotukhin, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36258 llvm-svn: 309966
* [NewGVN] fix typos; NFCSanjay Patel2017-08-031-8/+8
| | | | llvm-svn: 309946
* [Cloning] Move distinct GlobalVariable debug info metadata in CloneModuleEwan Crawford2017-08-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Duplicating the distinct Subprogram and CU metadata nodes seems like the incorrect thing to do in CloneModule for GlobalVariable debug info. As it results in the scope of the GlobalVariable DI no longer being consistent with the rest of the module, and the new CU is absent from llvm.dbg.cu. Fixed by adding RF_MoveDistinctMDs to MapMetadata flags for GlobalVariables. Current unit test IR after clone: ``` @gv = global i32 1, comdat($comdat), !dbg !0, !type !5 define private void @f() comdat($comdat) personality void ()* @persfn !dbg !14 { !llvm.dbg.cu = !{!10} !0 = !DIGlobalVariableExpression(var: !1) !1 = distinct !DIGlobalVariable(name: "gv", linkageName: "gv", scope: !2, file: !3, line: 1, type: !9, isLocal: false, isDefinition: true) !2 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !3, line: 4, type: !4, isLocal: true, isDefinition: true, scopeLine: 3, isOptimized: false, unit: !6, variables: !5) !3 = !DIFile(filename: "filename.c", directory: "/file/dir/") !4 = !DISubroutineType(types: !5) !5 = !{} !6 = distinct !DICompileUnit(language: DW_LANG_C99, file: !7, producer: "CloneModule", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, globals: !8) !7 = !DIFile(filename: "filename.c", directory: "/file/dir") !8 = !{!0} !9 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)") !10 = distinct !DICompileUnit(language: DW_LANG_C99, file: !7, producer: "CloneModule", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, globals: !11) !11 = !{!12} !12 = !DIGlobalVariableExpression(var: !13) !13 = distinct !DIGlobalVariable(name: "gv", linkageName: "gv", scope: !14, file: !3, line: 1, type: !9, isLocal: false, isDefinition: true) !14 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !3, line: 4, type: !4, isLocal: true, isDefinition: true, scopeLine: 3, isOptimized: false, unit: !10, variables: !5) ``` Patched IR after clone: ``` @gv = global i32 1, comdat($comdat), !dbg !0, !type !5 define private void @f() comdat($comdat) personality void ()* @persfn !dbg !2 { !llvm.dbg.cu = !{!6} !0 = !DIGlobalVariableExpression(var: !1) !1 = distinct !DIGlobalVariable(name: "gv", linkageName: "gv", scope: !2, file: !3, line: 1, type: !9, isLocal: false, isDefinition: true) !2 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !3, line: 4, type: !4, isLocal: true, isDefinition: true, scopeLine: 3, isOptimized: false, unit: !6, variables: !5) !3 = !DIFile(filename: "filename.c", directory: "/file/dir/") !4 = !DISubroutineType(types: !5) !5 = !{} !6 = distinct !DICompileUnit(language: DW_LANG_C99, file: !7, producer: "CloneModule", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, globals: !8) !7 = !DIFile(filename: "filename.c", directory: "/file/dir") !8 = !{!0} !9 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)") ``` Reviewers: aprantl, probinson, dblaikie, echristo, loladiro Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36082 llvm-svn: 309928
* LV: Don't insert runtime ptr checks on divergent targetsMatt Arsenault2017-08-021-0/+12
| | | | llvm-svn: 309890
* [InstCombine] Remove unnecessary temporary APInt. NFCICraig Topper2017-08-021-6/+1
| | | | llvm-svn: 309887
* [PM] Split LoopUnrollPass and make partial unroller a function passTeresa Johnson2017-08-021-26/+96
| | | | | | | | | | | | | | | | | | | | | | Summary: This is largely NFC*, in preparation for utilizing ProfileSummaryInfo and BranchFrequencyInfo analyses. In this patch I am only doing the splitting for the New PM, but I can do the same for the legacy PM as a follow-on if this looks good. *Not NFC since for partial unrolling we lose the updates done to the loop traversal (adding new sibling and child loops) - according to Chandler this is not very useful for partial unrolling, but it also means that the debugging flag -unroll-revisit-child-loops no longer works for partial unrolling. Reviewers: chandlerc Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36157 llvm-svn: 309886
* [InstCombine] Remove explicit code for folding (xor(zext(cmp)), 1) and ↵Craig Topper2017-08-021-15/+0
| | | | | | | | | | (xor(sext(cmp)), -1) to ext(!cmp). As far as I can tell this should be handled by foldCastedBitwiseLogic which is called later in visitXor. Differential Revision: https://reviews.llvm.org/D36214 llvm-svn: 309882
* [InstCombine] Support sext in foldLogicCastConstantCraig Topper2017-08-021-4/+14
| | | | | | | | This adds support for sext in foldLogicCastConstant. This is a prerequisite for D36214. Differential Revision: https://reviews.llvm.org/D36234 llvm-svn: 309880
OpenPOWER on IntegriCloud