summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [IR] Remove the DIExpression field from DIGlobalVariable.Adrian Prantl2016-12-162-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289920
* [ThinLTO] Thin link efficiency: More efficient export list computationTeresa Johnson2016-12-161-32/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of checking whether a global referenced by a function being imported is defined in the same module, speculatively always add the referenced globals to the module's export list. After all imports are computed, for each module prune any not in its defined set from its export list. For a huge C++ app with aggressive importing thresholds, even with D27687 we spent a lot of time invoking modulePath() from exportGlobalInModule (modulePath() was still the 2nd hottest routine in profile). The reason is that with comdat/linkonce the summary lists for each GUID can be long. For the app in question, for example, we were invoking exportGlobalInModule almost 2 million times, and we traversed an average of 63 entries in the summary list each time. This patch reduced the thin link time for the app by about 10% (on top of D27687) when using aggressive importing thresholds, and about 3.5% on average with default importing thresholds. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27755 llvm-svn: 289918
* [SimplifyLibCalls] Use a lambda. NFCI.Davide Italiano2016-12-161-6/+6
| | | | llvm-svn: 289911
* Revert "[IR] Remove the DIExpression field from DIGlobalVariable."Adrian Prantl2016-12-162-11/+8
| | | | | | This reverts commit 289902 while investigating bot berakage. llvm-svn: 289906
* Add missing library dep.Peter Collingbourne2016-12-161-1/+1
| | | | llvm-svn: 289903
* [IR] Remove the DIExpression field from DIGlobalVariable.Adrian Prantl2016-12-162-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289902
* IPO: Introduce ThinLTOBitcodeWriter pass.Peter Collingbourne2016-12-162-0/+345
| | | | | | | | | | | | | | This pass prepares a module containing type metadata for ThinLTO by splitting it into regular and thin LTO parts if possible, and writing both parts to a multi-module bitcode file. Modules that do not contain type metadata are written unmodified as a single module. All globals with type metadata are added to the regular LTO module, and the rest are added to the thin LTO module. Differential Revision: https://reviews.llvm.org/D27324 llvm-svn: 289899
* [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)Teresa Johnson2016-12-151-9/+13
| | | | | | | | | | | | | | | | | Summary: We were reinvoking exportGlobalInModule numerous times redundantly. No need to re-export globals referenced by a global that was already imported from its module. This resulted in a large speedup in the thin link for a big application, particularly when importing aggressiveness was cranked up. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27687 llvm-svn: 289896
* [SimplifyLibCalls] Lower fls() to llvm.ctlz().Davide Italiano2016-12-151-0/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D14590 llvm-svn: 289894
* [SimplifyLibCalls] Remove redundant folding logic for ffs().Davide Italiano2016-12-151-13/+3
| | | | | | | | Lowering to llvm.cttz() will result in constant folding anyway if the argument to ffs is a constant. Pointed out by Eli for fls() in D14590. llvm-svn: 289888
* [ThinLTO] Revert part of r289843 that belonged to another patch.Teresa Johnson2016-12-151-13/+9
| | | | | | | | The code change for D27687 accidentally got committed along with the main change in r289843. Revert it temporarily, so that I can recommit it along with its test as intended. llvm-svn: 289875
* [ThinLTO] Remove stale comment (NFC)Teresa Johnson2016-12-151-2/+1
| | | | | | This should have been removed with r288446. llvm-svn: 289871
* [ThinLTO] Thin link efficiency: skip candidate added later with higher ↵Teresa Johnson2016-12-151-4/+13
| | | | | | | | | | | | | | | | | | | | | | | threshold (NFC) Summary: Thin link efficiency improvement. After adding an importing candidate to the worklist we might have later added it again with a higher threshold. Skip it when popped from the worklist if we recorded a higher threshold than the current worklist entry, it will get processed again at the higher threshold when that entry is popped. This required adding the summary's GUID to the worklist, so that it can be used to query the recorded highest threshold for it when we pop from the worklist. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27696 llvm-svn: 289867
* [LV] Enable vectorization of loops with conditional stores by defaultMatthew Simpson2016-12-151-1/+1
| | | | | | | | | This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863
* [SimplifyCFG] Merge debug locations when hoisting an instruction from a ↵Andrea Di Biagio2016-12-151-4/+4
| | | | | | | | | | | | | | | | | then/else branch. NFC. Now that a new API to merge debug locations has been committed at r289661 (see review D26256 for more details), we can use it to "improve" the code added by revision r280995. Instead of nulling the debugloc of a commoned instruction, we use the 'merged' debug location. At the moment, this is just a no functional change since function `DILocation::getMergedLocation()` is just a stub and would always return a null location. Differential Revision: https://reviews.llvm.org/D27804 llvm-svn: 289862
* [InstCombine] add folds for icmp (smin X, Y), XSanjay Patel2016-12-151-0/+37
| | | | | | | | | | | | | | Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. This patch won't solve the example that was attached to that thread, so something else still needs fixing. The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that we want to fold to already exists, but sometimes it's the swapped form of what we want. Corresponding changes for smax/umin/umax to follow. Differential Revision: https://reviews.llvm.org/D27531 llvm-svn: 289855
* [ThinLTO] Ensure callees get hot threshold when first seen on cold pathTeresa Johnson2016-12-151-24/+28
| | | | | | | | | | | | | | | | | This is split out from D27696, since it turned out to be a bug fix and not part of the NFC efficiency change. Keep the same adjusted (possibly decayed) threshold in both the worklist and the ImportList. Otherwise if we encountered it first along a cold path, the callee would be added to the worklist with a lower decayed threshold than when it is later encountered along a hot path. But the logic uses the threshold recorded in the ImportList entry to check if we should re-add it, and without this patch the threshold recorded there is the same along both paths so we don't re-add it. Using the same possibly decayed threshold in the ImportList ensures we re-add it later with the higher non-decayed hot path threshold. llvm-svn: 289843
* Revert "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of ↵Robert Lougher2016-12-151-8/+1
| | | | | | | | common inst" Reverting as it is causing buildbot failures (address sanitizer). llvm-svn: 289833
* [SimplifyCFG] In sinkLastInstruction correctly set debugloc of "common" inst Robert Lougher2016-12-151-1/+9
| | | | | | | | | | | Simplify CFG will try to sink the last instruction in a series of basic blocks, creating a "common" instruction in the successor block (sinkLastInstruction). When it does this, the debug location of the single instruction should be the merged debug locations of the commoned instructions. Differential Revision: https://reviews.llvm.org/D27590 llvm-svn: 289828
* [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmpEhsan Amiri2016-12-152-2/+98
| | | | | | | | | | | | | | | | | A number of new patterns for simplifying and/xor of icmp: (icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true: 1- (%x = and %a, %mask) and (%y = and %b, %mask) 2- %mask is a power of 2. (icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true: 1- (%x = and %a, %mask1) and (%y = and %b, %mask2) 2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t. For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8 violates condition (2) above. So this optimization cannot be applied. llvm-svn: 289813
* [AVX-512][InstCombine] Add masked scalar FMA intrinsics to ↵Craig Topper2016-12-152-0/+45
| | | | | | SimplifyDemandedVectorElts. llvm-svn: 289759
* Remove the AssumptionCacheHal Finkel2016-12-1548-502/+249
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* Make processing @llvm.assume more efficient by using operand bundlesHal Finkel2016-12-152-3/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755
* Only sets profile summary when it was not preset.Dehao Chen2016-12-141-1/+2
| | | | | | | | | | | | Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass. Reviewers: eraman, davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D27733 llvm-svn: 289725
* Fix the bug in r289714 (NFC).Dehao Chen2016-12-141-1/+1
| | | | llvm-svn: 289724
* [asan] Don't skip instrumentation of masked load/store unless we've seen a ↵Filipe Cabecinhas2016-12-141-3/+12
| | | | | | | | | | | | full load/store on that pointer. Reviewers: kcc, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27625 llvm-svn: 289718
* [asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation ↵Filipe Cabecinhas2016-12-141-0/+4
| | | | | | | | | | | | instrumentation. Reviewers: kcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27548 llvm-svn: 289717
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289714
* [InstCombine] Folding of a compare with RHS const should merge debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are compares that have a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 8 of 8 for D26256. Folding of a compare that has a RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289704
* [InstCombine] Folding of a binop with RHS const should merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a binop with a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 7 of 8 for D26256. Folding of a binop with RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289699
* [GVNHoist] Move GVNHoist to function simplification part of pipeline.Geoff Berry2016-12-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Move GVNHoist to later in the optimization pipeline, specifically, to the function simplification part of the pipeline. The new pipeline location allows GVNHoist to run on a function after its callees have been inlined but before the function has been considered for inlining into its callers, exposing more opportunities for hoisting. Performance results on AArch64 kryo: Improvements: Benchmarks/CoyoteBench/fftbench -24.952% spec2006/bzip2 -4.071% internal bmark -3.177% Benchmarks/PAQ8p/paq8p -1.754% spec2000/perlbmk -1.328% spec2006/h264ref -1.140% Regressions: internal bmark +1.818% Benchmarks/mafft/pairlocalalign +1.084% Reviewers: sebpop, dberlin, hiraditya Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27722 llvm-svn: 289696
* [InstCombine] When folding casts through a phi node merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a cast, instcombine will try to pull them through the phi node, combining them into a single cast. When it does this, the debug location of the new cast should be the merged debug locations of the phi node arguments. Patch 6 of 8 for D26256. Folding of a cast operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289693
* [InstCombine] Folding loads through a phi node should merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a load, instcombine will try to pull them through the phi node, combining them into a single load. When it does this, the debug location of the new load should be the merged debug locations of the phi node arguments. Patch 5 of 8 for D26256. Folding of a load operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289688
* [InstCombine] When folding GEP through a phi node merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are getelementptr, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new getelementptr should be the merged debug locations of the phi node arguments. Patch 4 of 8 for D26256. Folding of a getelementptr operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289684
* [InstCombine] Merge debug locations when folding through a phi nodeRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 3 of 8 for D26256. Folding of a compare operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289681
* [InstCombine] Merge debug locations when folding through a phi nodeRobert Lougher2016-12-142-1/+21
| | | | | | | | | | | | | If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 2 of 8 for D26256. Folding of a binary operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289679
* revert r289669 which breaks botsDehao Chen2016-12-141-5/+0
| | | | llvm-svn: 289676
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289669
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-143-11/+11
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar ↵Craig Topper2016-12-141-1/+17
| | | | | | floating point to integer conversion intrinsics. llvm-svn: 289639
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar ↵Craig Topper2016-12-142-20/+14
| | | | | | | | add/sub/mul/div/max/min intrinsics better. Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking. llvm-svn: 289636
* [X86][InstCombine] Handle scalar fmadd intrinsics correctly in ↵Craig Topper2016-12-142-15/+22
| | | | | | | | SimplifyDemandedVectorElts. Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly. llvm-svn: 289632
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round ↵Craig Topper2016-12-142-38/+21
| | | | | | | | | | | | intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289629
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar ↵Craig Topper2016-12-142-20/+34
| | | | | | | | | | | | min/max/cmp intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289628
* Change CoverageTracker from a global variable to member variable to avoid ↵Dehao Chen2016-12-131-52/+52
| | | | | | breaking thread-safety. (NFC) llvm-svn: 289603
* [IRCE] Avoid loop optimizations on pre and post loopsAnna Thomas2016-12-131-0/+34
| | | | | | | | | | | | | | | | | | | | | Summary: This patch will add loop metadata on the pre and post loops generated by IRCE. Currently, we have metadata for disabling optimizations such as vectorization, unrolling, loop distribution and LICM versioning (and confirmed that these optimizations check for the metadata before proceeding with the transformation). The pre and post loops generated by IRCE need not go through loop opts (since these are slow paths). Added two test cases as well. Reviewers: sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26806 llvm-svn: 289588
* [LV] Don't vectorize when we have a small static bound on trip countMichael Kuperstein2016-12-131-2/+2
| | | | | | | | | | We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583
* [ADCE] Add code to remove dead branchesDavid Callahan2016-12-131-54/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is last in of a series of patches to evolve ADCE.cpp to support removing of unnecessary control flow. This patch adds the code to update the control and data flow graphs to remove the dead control flow. Also update unit tests to test the capability to remove dead, may-be-infinite loop which is enabled by the switch -adce-remove-loops. Previous patches: D23824 [ADCE] Add handling of PHI nodes when removing control flow D23559 [ADCE] Add control dependence computation D23225 [ADCE] Modify data structures to support removing control flow D23065 [ADCE] Refactor anticipating new functionality (NFC) D23102 [ADCE] Refactoring for new functionality (NFC) Reviewers: dberlin, majnemer, nadav, mehdi_amini Subscribers: llvm-commits, david2050, freik, twoh Differential Revision: https://reviews.llvm.org/D24918 llvm-svn: 289548
* [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar ↵Craig Topper2016-12-132-0/+18
| | | | | | | | | | intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523
* [PGO] Fix insane counts due to nonreturn callsRong Xu2016-12-131-2/+11
| | | | | | | | | | | | | | | | Summary: Since we don't break BBs for function calls. We might get some insane counts (wrap of unsigned) in the presence of noreturn calls. This patch sets these counts to zero instead of the wrapped number. Reviewers: davidxl Subscribers: xur, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D27602 llvm-svn: 289521
OpenPOWER on IntegriCloud