summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmpEhsan Amiri2016-12-152-2/+98
| | | | | | | | | | | | | | | | | A number of new patterns for simplifying and/xor of icmp: (icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true: 1- (%x = and %a, %mask) and (%y = and %b, %mask) 2- %mask is a power of 2. (icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true: 1- (%x = and %a, %mask1) and (%y = and %b, %mask2) 2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t. For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8 violates condition (2) above. So this optimization cannot be applied. llvm-svn: 289813
* [AVX-512][InstCombine] Add masked scalar FMA intrinsics to ↵Craig Topper2016-12-152-0/+45
| | | | | | SimplifyDemandedVectorElts. llvm-svn: 289759
* Remove the AssumptionCacheHal Finkel2016-12-1548-502/+249
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* Make processing @llvm.assume more efficient by using operand bundlesHal Finkel2016-12-152-3/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755
* Only sets profile summary when it was not preset.Dehao Chen2016-12-141-1/+2
| | | | | | | | | | | | Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass. Reviewers: eraman, davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D27733 llvm-svn: 289725
* Fix the bug in r289714 (NFC).Dehao Chen2016-12-141-1/+1
| | | | llvm-svn: 289724
* [asan] Don't skip instrumentation of masked load/store unless we've seen a ↵Filipe Cabecinhas2016-12-141-3/+12
| | | | | | | | | | | | full load/store on that pointer. Reviewers: kcc, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27625 llvm-svn: 289718
* [asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation ↵Filipe Cabecinhas2016-12-141-0/+4
| | | | | | | | | | | | instrumentation. Reviewers: kcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27548 llvm-svn: 289717
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289714
* [InstCombine] Folding of a compare with RHS const should merge debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are compares that have a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 8 of 8 for D26256. Folding of a compare that has a RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289704
* [InstCombine] Folding of a binop with RHS const should merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a binop with a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 7 of 8 for D26256. Folding of a binop with RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289699
* [GVNHoist] Move GVNHoist to function simplification part of pipeline.Geoff Berry2016-12-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Move GVNHoist to later in the optimization pipeline, specifically, to the function simplification part of the pipeline. The new pipeline location allows GVNHoist to run on a function after its callees have been inlined but before the function has been considered for inlining into its callers, exposing more opportunities for hoisting. Performance results on AArch64 kryo: Improvements: Benchmarks/CoyoteBench/fftbench -24.952% spec2006/bzip2 -4.071% internal bmark -3.177% Benchmarks/PAQ8p/paq8p -1.754% spec2000/perlbmk -1.328% spec2006/h264ref -1.140% Regressions: internal bmark +1.818% Benchmarks/mafft/pairlocalalign +1.084% Reviewers: sebpop, dberlin, hiraditya Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27722 llvm-svn: 289696
* [InstCombine] When folding casts through a phi node merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a cast, instcombine will try to pull them through the phi node, combining them into a single cast. When it does this, the debug location of the new cast should be the merged debug locations of the phi node arguments. Patch 6 of 8 for D26256. Folding of a cast operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289693
* [InstCombine] Folding loads through a phi node should merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a load, instcombine will try to pull them through the phi node, combining them into a single load. When it does this, the debug location of the new load should be the merged debug locations of the phi node arguments. Patch 5 of 8 for D26256. Folding of a load operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289688
* [InstCombine] When folding GEP through a phi node merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are getelementptr, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new getelementptr should be the merged debug locations of the phi node arguments. Patch 4 of 8 for D26256. Folding of a getelementptr operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289684
* [InstCombine] Merge debug locations when folding through a phi nodeRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 3 of 8 for D26256. Folding of a compare operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289681
* [InstCombine] Merge debug locations when folding through a phi nodeRobert Lougher2016-12-142-1/+21
| | | | | | | | | | | | | If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 2 of 8 for D26256. Folding of a binary operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289679
* revert r289669 which breaks botsDehao Chen2016-12-141-5/+0
| | | | llvm-svn: 289676
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289669
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-143-11/+11
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar ↵Craig Topper2016-12-141-1/+17
| | | | | | floating point to integer conversion intrinsics. llvm-svn: 289639
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar ↵Craig Topper2016-12-142-20/+14
| | | | | | | | add/sub/mul/div/max/min intrinsics better. Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking. llvm-svn: 289636
* [X86][InstCombine] Handle scalar fmadd intrinsics correctly in ↵Craig Topper2016-12-142-15/+22
| | | | | | | | SimplifyDemandedVectorElts. Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly. llvm-svn: 289632
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round ↵Craig Topper2016-12-142-38/+21
| | | | | | | | | | | | intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289629
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar ↵Craig Topper2016-12-142-20/+34
| | | | | | | | | | | | min/max/cmp intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289628
* Change CoverageTracker from a global variable to member variable to avoid ↵Dehao Chen2016-12-131-52/+52
| | | | | | breaking thread-safety. (NFC) llvm-svn: 289603
* [IRCE] Avoid loop optimizations on pre and post loopsAnna Thomas2016-12-131-0/+34
| | | | | | | | | | | | | | | | | | | | | Summary: This patch will add loop metadata on the pre and post loops generated by IRCE. Currently, we have metadata for disabling optimizations such as vectorization, unrolling, loop distribution and LICM versioning (and confirmed that these optimizations check for the metadata before proceeding with the transformation). The pre and post loops generated by IRCE need not go through loop opts (since these are slow paths). Added two test cases as well. Reviewers: sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26806 llvm-svn: 289588
* [LV] Don't vectorize when we have a small static bound on trip countMichael Kuperstein2016-12-131-2/+2
| | | | | | | | | | We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583
* [ADCE] Add code to remove dead branchesDavid Callahan2016-12-131-54/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is last in of a series of patches to evolve ADCE.cpp to support removing of unnecessary control flow. This patch adds the code to update the control and data flow graphs to remove the dead control flow. Also update unit tests to test the capability to remove dead, may-be-infinite loop which is enabled by the switch -adce-remove-loops. Previous patches: D23824 [ADCE] Add handling of PHI nodes when removing control flow D23559 [ADCE] Add control dependence computation D23225 [ADCE] Modify data structures to support removing control flow D23065 [ADCE] Refactor anticipating new functionality (NFC) D23102 [ADCE] Refactoring for new functionality (NFC) Reviewers: dberlin, majnemer, nadav, mehdi_amini Subscribers: llvm-commits, david2050, freik, twoh Differential Revision: https://reviews.llvm.org/D24918 llvm-svn: 289548
* [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar ↵Craig Topper2016-12-132-0/+18
| | | | | | | | | | intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523
* [PGO] Fix insane counts due to nonreturn callsRong Xu2016-12-131-2/+11
| | | | | | | | | | | | | | | | Summary: Since we don't break BBs for function calls. We might get some insane counts (wrap of unsigned) in the presence of noreturn calls. This patch sets these counts to zero instead of the wrapped number. Reviewers: davidxl Subscribers: xur, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D27602 llvm-svn: 289521
* [SCCP] Debug diagnostic goes under DEBUG(). NFCI.Davide Italiano2016-12-131-1/+1
| | | | llvm-svn: 289519
* [SLP] Fix sign-extends for type-shrinkingMatthew Simpson2016-12-121-15/+62
| | | | | | | | | | | | | | This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470
* [ThinLTO] Remove useless code (NFC)Teresa Johnson2016-12-121-4/+0
| | | | | | Should have been removed in r288446. llvm-svn: 289466
* [InstCombine] fix bug when offsetting case values of a switch (PR31260)Sanjay Patel2016-12-121-25/+15
| | | | | | | | | | | | | We could truncate the condition and then try to fold the add into the original condition value causing wrong case constants to be used. Move the offset transform ahead of the truncate transform and return after each transform, so there's no chance of getting confused values. Fix for: https://llvm.org/bugs/show_bug.cgi?id=31260 llvm-svn: 289442
* [InstCombine] clean up range-for-loops in visitSwitchInst(); NFCISanjay Patel2016-12-121-7/+7
| | | | llvm-svn: 289439
* [InstCombine][XOP] The instructions for the scalar frcz intrinsics are ↵Craig Topper2016-12-111-2/+14
| | | | | | defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead. llvm-svn: 289411
* [SCCP] Use the appropriate helper function. NFCI.Davide Italiano2016-12-111-2/+2
| | | | llvm-svn: 289406
* [X86][InstCombine] Add support for scalar FMA intrinsics to ↵Craig Topper2016-12-111-0/+29
| | | | | | | | SimplifyDemandedVectorElts. This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them. llvm-svn: 289377
* [X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for ↵Craig Topper2016-12-111-0/+8
| | | | | | | | scalar FMA intrinsics. These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them. llvm-svn: 289372
* [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for ↵Craig Topper2016-12-111-1/+3
| | | | | | | | scalar cmp intrinsics with masking and rounding. These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead. llvm-svn: 289371
* [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded ↵Craig Topper2016-12-111-0/+31
| | | | | | | | elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding. These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well. llvm-svn: 289370
* [AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls ↵Craig Topper2016-12-111-10/+10
| | | | | | to match 128 and 256-bit. llvm-svn: 289354
* [X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a ↵Craig Topper2016-12-111-2/+3
| | | | | | shufflevector if the indices are constant. llvm-svn: 289348
* [InstCombine] add helper for shift-by-shift folds; NFCISanjay Patel2016-12-101-150/+162
| | | | | | | These are currently limited to integer types, but we should be able to extend to splat vectors and possibly general vectors. llvm-svn: 289343
* [SCCP] Teach the pass about `mul %x 0` even if %x is overdefined.Davide Italiano2016-12-091-2/+5
| | | | | | | | | | | | | | | | | | | | | | | The motivating example is: extern int patatino; int goo() { int x = 0; for (int i = 0; i < 1000000; ++i) { x *= patatino; } return x; } Currently SCCP will not realize that this function returns always zero, therefore will try to unroll and vectorize the loop at -O3 producing an awful lot of (useless) code. With this change, it will just produce: 0000000000000000 <g>: xor %eax,%eax retq llvm-svn: 289175
* WholeProgramDevirt: Teach the pass to handle structs of arrays.Peter Collingbourne2016-12-091-23/+22
| | | | | | This will become necessary in some cases once D22296 lands. llvm-svn: 289165
* Make WholeProgramDevirt understand ConstStruct vtables.Peter Collingbourne2016-12-091-13/+37
| | | | | | | | Based on a patch by LemonBoy! Differential Revision: https://reviews.llvm.org/D26581 llvm-svn: 289162
* [SCCP] Make sure SCCP and ConstantFolding agree on undef >> a.Davide Italiano2016-12-081-2/+2
| | | | | | | Currently SCCP folds the value to -1, while ConstantProp folds to 0. This changes SCCP to do what ConstantFolding does. llvm-svn: 289147
* [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.Alexey Bataev2016-12-081-66/+80
| | | | | | | | | | | | | | | When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043
OpenPOWER on IntegriCloud