summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Only sets profile summary when it was not preset.Dehao Chen2016-12-141-1/+2
| | | | | | | | | | | | Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass. Reviewers: eraman, davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D27733 llvm-svn: 289725
* Fix the bug in r289714 (NFC).Dehao Chen2016-12-141-1/+1
| | | | llvm-svn: 289724
* [asan] Don't skip instrumentation of masked load/store unless we've seen a ↵Filipe Cabecinhas2016-12-141-3/+12
| | | | | | | | | | | | full load/store on that pointer. Reviewers: kcc, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27625 llvm-svn: 289718
* [asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation ↵Filipe Cabecinhas2016-12-141-0/+4
| | | | | | | | | | | | instrumentation. Reviewers: kcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27548 llvm-svn: 289717
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289714
* [InstCombine] Folding of a compare with RHS const should merge debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are compares that have a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 8 of 8 for D26256. Folding of a compare that has a RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289704
* [InstCombine] Folding of a binop with RHS const should merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a binop with a RHS constant, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new op should be the merged debug locations of the phi node arguments. Patch 7 of 8 for D26256. Folding of a binop with RHS constant. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289699
* [GVNHoist] Move GVNHoist to function simplification part of pipeline.Geoff Berry2016-12-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Move GVNHoist to later in the optimization pipeline, specifically, to the function simplification part of the pipeline. The new pipeline location allows GVNHoist to run on a function after its callees have been inlined but before the function has been considered for inlining into its callers, exposing more opportunities for hoisting. Performance results on AArch64 kryo: Improvements: Benchmarks/CoyoteBench/fftbench -24.952% spec2006/bzip2 -4.071% internal bmark -3.177% Benchmarks/PAQ8p/paq8p -1.754% spec2000/perlbmk -1.328% spec2006/h264ref -1.140% Regressions: internal bmark +1.818% Benchmarks/mafft/pairlocalalign +1.084% Reviewers: sebpop, dberlin, hiraditya Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27722 llvm-svn: 289696
* [InstCombine] When folding casts through a phi node merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a cast, instcombine will try to pull them through the phi node, combining them into a single cast. When it does this, the debug location of the new cast should be the merged debug locations of the phi node arguments. Patch 6 of 8 for D26256. Folding of a cast operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289693
* [InstCombine] Folding loads through a phi node should merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are a load, instcombine will try to pull them through the phi node, combining them into a single load. When it does this, the debug location of the new load should be the merged debug locations of the phi node arguments. Patch 5 of 8 for D26256. Folding of a load operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289688
* [InstCombine] When folding GEP through a phi node merge the debug locationsRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are getelementptr, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the new getelementptr should be the merged debug locations of the phi node arguments. Patch 4 of 8 for D26256. Folding of a getelementptr operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289684
* [InstCombine] Merge debug locations when folding through a phi nodeRobert Lougher2016-12-141-1/+1
| | | | | | | | | | | | | If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 3 of 8 for D26256. Folding of a compare operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289681
* [InstCombine] Merge debug locations when folding through a phi nodeRobert Lougher2016-12-142-1/+21
| | | | | | | | | | | | | If all the operands to a phi node are of the same operation, instcombine will try to pull them through the phi node, combining them into a single operation. When it does this, the debug location of the operation should be the merged debug locations of the phi node arguments. Patch 2 of 8 for D26256. Folding of a binary operation. Differential Revision: https://reviews.llvm.org/D26256 llvm-svn: 289679
* revert r289669 which breaks botsDehao Chen2016-12-141-5/+0
| | | | llvm-svn: 289676
* Create SampleProfileLoader pass in llvm instead of clangDehao Chen2016-12-141-0/+5
| | | | | | | | | | | | Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder. Reviewers: tejohnson, davidxl, dnovillo Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27743 llvm-svn: 289669
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-143-11/+11
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar ↵Craig Topper2016-12-141-1/+17
| | | | | | floating point to integer conversion intrinsics. llvm-svn: 289639
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar ↵Craig Topper2016-12-142-20/+14
| | | | | | | | add/sub/mul/div/max/min intrinsics better. Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking. llvm-svn: 289636
* [X86][InstCombine] Handle scalar fmadd intrinsics correctly in ↵Craig Topper2016-12-142-15/+22
| | | | | | | | SimplifyDemandedVectorElts. Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly. llvm-svn: 289632
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round ↵Craig Topper2016-12-142-38/+21
| | | | | | | | | | | | intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289629
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar ↵Craig Topper2016-12-142-20/+34
| | | | | | | | | | | | min/max/cmp intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289628
* Change CoverageTracker from a global variable to member variable to avoid ↵Dehao Chen2016-12-131-52/+52
| | | | | | breaking thread-safety. (NFC) llvm-svn: 289603
* [IRCE] Avoid loop optimizations on pre and post loopsAnna Thomas2016-12-131-0/+34
| | | | | | | | | | | | | | | | | | | | | Summary: This patch will add loop metadata on the pre and post loops generated by IRCE. Currently, we have metadata for disabling optimizations such as vectorization, unrolling, loop distribution and LICM versioning (and confirmed that these optimizations check for the metadata before proceeding with the transformation). The pre and post loops generated by IRCE need not go through loop opts (since these are slow paths). Added two test cases as well. Reviewers: sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26806 llvm-svn: 289588
* [LV] Don't vectorize when we have a small static bound on trip countMichael Kuperstein2016-12-131-2/+2
| | | | | | | | | | We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583
* [ADCE] Add code to remove dead branchesDavid Callahan2016-12-131-54/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is last in of a series of patches to evolve ADCE.cpp to support removing of unnecessary control flow. This patch adds the code to update the control and data flow graphs to remove the dead control flow. Also update unit tests to test the capability to remove dead, may-be-infinite loop which is enabled by the switch -adce-remove-loops. Previous patches: D23824 [ADCE] Add handling of PHI nodes when removing control flow D23559 [ADCE] Add control dependence computation D23225 [ADCE] Modify data structures to support removing control flow D23065 [ADCE] Refactor anticipating new functionality (NFC) D23102 [ADCE] Refactoring for new functionality (NFC) Reviewers: dberlin, majnemer, nadav, mehdi_amini Subscribers: llvm-commits, david2050, freik, twoh Differential Revision: https://reviews.llvm.org/D24918 llvm-svn: 289548
* [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar ↵Craig Topper2016-12-132-0/+18
| | | | | | | | | | intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523
* [PGO] Fix insane counts due to nonreturn callsRong Xu2016-12-131-2/+11
| | | | | | | | | | | | | | | | Summary: Since we don't break BBs for function calls. We might get some insane counts (wrap of unsigned) in the presence of noreturn calls. This patch sets these counts to zero instead of the wrapped number. Reviewers: davidxl Subscribers: xur, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D27602 llvm-svn: 289521
* [SCCP] Debug diagnostic goes under DEBUG(). NFCI.Davide Italiano2016-12-131-1/+1
| | | | llvm-svn: 289519
* [SLP] Fix sign-extends for type-shrinkingMatthew Simpson2016-12-121-15/+62
| | | | | | | | | | | | | | This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470
* [ThinLTO] Remove useless code (NFC)Teresa Johnson2016-12-121-4/+0
| | | | | | Should have been removed in r288446. llvm-svn: 289466
* [InstCombine] fix bug when offsetting case values of a switch (PR31260)Sanjay Patel2016-12-121-25/+15
| | | | | | | | | | | | | We could truncate the condition and then try to fold the add into the original condition value causing wrong case constants to be used. Move the offset transform ahead of the truncate transform and return after each transform, so there's no chance of getting confused values. Fix for: https://llvm.org/bugs/show_bug.cgi?id=31260 llvm-svn: 289442
* [InstCombine] clean up range-for-loops in visitSwitchInst(); NFCISanjay Patel2016-12-121-7/+7
| | | | llvm-svn: 289439
* [InstCombine][XOP] The instructions for the scalar frcz intrinsics are ↵Craig Topper2016-12-111-2/+14
| | | | | | defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead. llvm-svn: 289411
* [SCCP] Use the appropriate helper function. NFCI.Davide Italiano2016-12-111-2/+2
| | | | llvm-svn: 289406
* [X86][InstCombine] Add support for scalar FMA intrinsics to ↵Craig Topper2016-12-111-0/+29
| | | | | | | | SimplifyDemandedVectorElts. This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them. llvm-svn: 289377
* [X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for ↵Craig Topper2016-12-111-0/+8
| | | | | | | | scalar FMA intrinsics. These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them. llvm-svn: 289372
* [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for ↵Craig Topper2016-12-111-1/+3
| | | | | | | | scalar cmp intrinsics with masking and rounding. These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead. llvm-svn: 289371
* [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded ↵Craig Topper2016-12-111-0/+31
| | | | | | | | elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding. These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well. llvm-svn: 289370
* [AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls ↵Craig Topper2016-12-111-10/+10
| | | | | | to match 128 and 256-bit. llvm-svn: 289354
* [X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a ↵Craig Topper2016-12-111-2/+3
| | | | | | shufflevector if the indices are constant. llvm-svn: 289348
* [InstCombine] add helper for shift-by-shift folds; NFCISanjay Patel2016-12-101-150/+162
| | | | | | | These are currently limited to integer types, but we should be able to extend to splat vectors and possibly general vectors. llvm-svn: 289343
* [SCCP] Teach the pass about `mul %x 0` even if %x is overdefined.Davide Italiano2016-12-091-2/+5
| | | | | | | | | | | | | | | | | | | | | | | The motivating example is: extern int patatino; int goo() { int x = 0; for (int i = 0; i < 1000000; ++i) { x *= patatino; } return x; } Currently SCCP will not realize that this function returns always zero, therefore will try to unroll and vectorize the loop at -O3 producing an awful lot of (useless) code. With this change, it will just produce: 0000000000000000 <g>: xor %eax,%eax retq llvm-svn: 289175
* WholeProgramDevirt: Teach the pass to handle structs of arrays.Peter Collingbourne2016-12-091-23/+22
| | | | | | This will become necessary in some cases once D22296 lands. llvm-svn: 289165
* Make WholeProgramDevirt understand ConstStruct vtables.Peter Collingbourne2016-12-091-13/+37
| | | | | | | | Based on a patch by LemonBoy! Differential Revision: https://reviews.llvm.org/D26581 llvm-svn: 289162
* [SCCP] Make sure SCCP and ConstantFolding agree on undef >> a.Davide Italiano2016-12-081-2/+2
| | | | | | | Currently SCCP folds the value to -1, while ConstantProp folds to 0. This changes SCCP to do what ConstantFolding does. llvm-svn: 289147
* [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.Alexey Bataev2016-12-081-66/+80
| | | | | | | | | | | | | | | When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043
* CFI-icall on ThumbEvgeniy Stepanov2016-12-081-4/+14
| | | | | | | | | | | | Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb. Use b.w branch instruction. Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set. This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc. Differential Revision: https://reviews.llvm.org/D27499 llvm-svn: 289008
* [BDCE] Skip metadata while replacing uses.Davide Italiano2016-12-071-2/+3
| | | | | | | | | | | | The fix committed in r288851 doesn't cover all the cases. In particular, if we have an instruction with side effects which has a no non-dbg use not depending on the bits, we still perform RAUW destroying the dbg.value's first argument. Prevent metadata from being replaced here to avoid the issue. Differential Revision: https://reviews.llvm.org/D27534 llvm-svn: 288987
* [GVNHoist] Invalidate MemDep when an instruction is moved.Eli Friedman2016-12-071-0/+1
| | | | | | | | | | See also r279907. Fixes https://llvm.org/bugs/show_bug.cgi?id=30991 . Differential Revision: https://reviews.llvm.org/D27493 llvm-svn: 288968
* [LV] Scalarize operands of predicated instructionsMatthew Simpson2016-12-071-7/+210
| | | | | | | | | | | | | | | | | | | | | | This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909
OpenPOWER on IntegriCloud