summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Add hasProfileData() to check if a function has profile data. NFC.Easwaran Raman2017-12-225-10/+9
| | | | | | | | | | | | | | | | | | | Summary: This replaces calls to getEntryCount().hasValue() with hasProfileData that does the same thing. This refactoring is useful to do before adding synthetic function entry counts but also a useful cleanup IMO even otherwise. I have used hasProfileData instead of hasRealProfileData as David had earlier suggested since I think profile implies "real" and I use the phrase "synthetic entry count" and not "synthetic profile count" but I am fine calling it hasRealProfileData if you prefer. Reviewers: davidxl, silvas Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41461 llvm-svn: 321331
* [SimplifyCFG] Avoid quadratic on a predecessors number behavior in ↵Michael Zolotukhin2017-12-211-14/+10
| | | | | | | | | | | | | | | | | | | | | | instruction sinking. If a block has N predecessors, then the current algorithm will try to sink common code to this block N times (whenever we visit a predecessor). Every attempt to sink the common code includes going through all predecessors, so the complexity of the algorithm becomes O(N^2). With this patch we try to sink common code only when we visit the block itself. With this, the complexity goes down to O(N). As a side effect, the moment the code is sunk is slightly different than before (the order of simplifications has been changed), that's why I had to adjust two tests (note that neither of the tests is supposed to test SimplifyCFG): * test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic the changes that previous implementation of SimplifyCFG would do. * test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common code sinking by a command line flag. llvm-svn: 321236
* [ICP] Expose unconditional call promotion interfaceMatthew Simpson2017-12-201-77/+178
| | | | | | | | | | | | | | | | | | | | | | | This patch modifies the indirect call promotion utilities by exposing and using an unconditional call promotion interface. The unconditional promotion interface (i.e., call promotion without creating an if-then-else) can be used if it's known that an indirect call has only one possible callee. The existing conditional promotion interface uses this unconditional interface to promote an indirect call after it has been versioned and placed within the "then" block. A consequence of unconditional promotion is that the fix-up operations for phi nodes in the normal destination of invoke instructions are changed. This is necessary because the existing implementation assumed that an invoke had been versioned, creating a "merge" block where a return value bitcast could be placed. In the new implementation, the edge between a promoted invoke's parent block and its normal destination is split if needed to add a bitcast for the return value. If the invoke is also versioned, the phi node merging the return value of the promoted and original invoke instructions is placed in the "merge" block. Differential Revision: https://reviews.llvm.org/D40751 llvm-svn: 321210
* [hwasan] Implement -fsanitize-recover=hwaddress.Evgeniy Stepanov2017-12-201-7/+18
| | | | | | | | | | | | Summary: Very similar to AddressSanitizer, with the exception of the error type encoding. Reviewers: kcc, alekseyshl Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41417 llvm-svn: 321203
* [InstCombine] Add debug location to new caller.Florian Hahn2017-12-201-0/+1
| | | | | | | | | | Reviewers: rnk, aprantl, majnemer Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D414 llvm-svn: 321191
* Revert r320548:[SLP] Vectorize jumbled memory loadsMohammad Shahid2017-12-201-195/+83
| | | | llvm-svn: 321181
* [LV] Remove unnecessary DoExtraAnalysis guard (silent bug)Florian Hahn2017-12-201-2/+2
| | | | | | | | | | | | | | canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true. This doesn't make sense to me because reporting analysis information shouldn't alter legality checks. This is probably the result of a last minute minor change before committing (?). Patch by Diego Caballero. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D40973 llvm-svn: 321172
* [memcpyopt] Teach memcpyopt to optimize across basic blocksDan Gohman2017-12-201-10/+46
| | | | | | | | | | | | | | | | | This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. This is r319482 and r319483, along with fixes for PR35519: fix the optimization that merges stores into memsets to preserve cached memdep info, and fix memdep's non-local caching strategy to not assume that larger queries are always more conservative than smaller ones. Fixes PR28958 and PR35519. Differential Revision: https://reviews.llvm.org/D40802 llvm-svn: 321138
* Silence a bunch of implicit fallthrough warningsAdrian Prantl2017-12-193-0/+3
| | | | llvm-svn: 321114
* [SeparateConstOffsetFromGEP] Fix a typo. NFC.Haicheng Wu2017-12-191-1/+1
| | | | | | do CSE for to => do CSE to llvm-svn: 321098
* [JumpThreading] Restrict PRE across instructions that don't pass control to ↵Max Kazantsev2017-12-191-0/+14
| | | | | | | | | | | | | | successors PRE in JumpThreading should not be able to hoist copy of non-speculable loads across instructions that don't always transfer execution to their successors, otherwise they may introduce an unsafe load which otherwise would not be executed. The same problem for GVN was fixed as rL316975. Differential Revision: https://reviews.llvm.org/D40347 llvm-svn: 321063
* [PGO] Fix handling of cold entry count for instrumented PGOTeresa Johnson2017-12-181-1/+4
| | | | | | | | | | | | | | | | | | | | | Summary: In r277849, getEntryCount was changed to return None when the entry count was 0, specifically for SamplePGO where it means no samples were recorded. However, for instrumentation PGO a 0 entry count should be returned directly, since it does mean that the function was completely cold. Otherwise we end up treating these functions conservatively in isFunctionEntryCold() and isColdBB(). Instead, for SamplePGO use -1 when there are no samples, and change getEntryCount to return None when the value is -1. Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41307 llvm-svn: 321018
* Fix more inconsistent line endings. NFC.Dimitry Andric2017-12-181-9/+9
| | | | llvm-svn: 321016
* Removed unused DominanceFrontierMatt Arsenault2017-12-181-3/+0
| | | | llvm-svn: 321001
* [PGO] add MST min edge selection heuristic to ensure non-zero entry countXinliang David Li2017-12-181-7/+67
| | | | | | Differential Revision: http://reviews.llvm.org/D41059 llvm-svn: 320998
* [Memcpy Loop Lowering] Remove the fixed int8 lowering.Sean Fertile2017-12-181-80/+13
| | | | | | | | Switch over to the lowering that uses target supplied operand types. Differential Revision: https://reviews.llvm.org/D41201 llvm-svn: 320989
* [ThinLTO] Remove unused codeEugene Leviant2017-12-181-14/+0
| | | | | | | This is a re-commit of r320464, after patch for gold plugin was landed. llvm-svn: 320968
* [SROA] Disable non-whole-alloca splits by defaultHiroshi Inoue2017-12-181-1/+6
| | | | | | | This patch introduce a switch to control splitting of non-whole-alloca slices with default off. The switch will be default on again after fixing an issue reported in PR35657. llvm-svn: 320958
* [Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed.Sean Fertile2017-12-161-6/+10
| | | | | | | | If the loop operand type is int8 then there will be no residual loop for the unknown size expansion. Dont create the residual-size and bytes-copied values when they are not needed. llvm-svn: 320929
* [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+selSanjay Patel2017-12-161-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to do this for 2 reasons: 1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766. 2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern. More detail about what happens in the backend: 1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs into the shift variant. That is the opposite of this IR canonicalization. 2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization. 3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2 into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node when that's legal/custom. 4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a variety of ways. a. For #2, the vector path is missing the case for setlt with a '1' constant. b. For #3, we are missing a match for commuted versions of the shift variants. 5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the shift sequence when not. 6. In the following examples with this patch applied, we may get conditional moves rather than the shift produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate. define i32 @abs_shifty(i32 %x) { %signbit = ashr i32 %x, 31 %add = add i32 %signbit, %x %abs = xor i32 %signbit, %add ret i32 %abs } define i32 @abs_cmpsubsel(i32 %x) { %cmp = icmp slt i32 %x, zeroinitializer %sub = sub i32 zeroinitializer, %x %abs = select i1 %cmp, i32 %sub, i32 %x ret i32 %abs } define <4 x i32> @abs_shifty_vec(<4 x i32> %x) { %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %add = add <4 x i32> %signbit, %x %abs = xor <4 x i32> %signbit, %add ret <4 x i32> %abs } define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) { %cmp = icmp slt <4 x i32> %x, zeroinitializer %sub = sub <4 x i32> zeroinitializer, %x %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x ret <4 x i32> %abs } > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx > abs_shifty: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_cmpsubsel: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_shifty_vec: > vpabsd %xmm0, %xmm0 > retq > > abs_cmpsubsel_vec: > vpabsd %xmm0, %xmm0 > retq > > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64 > abs_shifty: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_cmpsubsel: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_shifty_vec: > abs v0.4s, v0.4s > ret > > abs_cmpsubsel_vec: > abs v0.4s, v0.4s > ret > > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le > abs_shifty: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_cmpsubsel: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_shifty_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > > abs_cmpsubsel_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > Differential Revision: https://reviews.llvm.org/D40984 llvm-svn: 320921
* [LV] Extend InstWidening with CM_Widen_RecursiveHal Finkel2017-12-161-5/+16
| | | | | | | | | | | | | | | | | | | | Changes to the original scalar loop during LV code gen cause the return value of Legal->isConsecutivePtr() to be inconsistent with the return value during legal/cost phases (further analysis and information of the bug is in D39346). This patch is an alternative fix to PR34965 following the CM_Widen approach proposed by Ayal and Gil in D39346. It extends InstWidening enum with CM_Widen_Reverse to properly record the widening decision for consecutive reverse memory accesses and, consequently, get rid of the Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner solution to PR34965 than the one in D39346. Fixes PR34965. Patch by Diego Caballero, thanks! Differential Revision: https://reviews.llvm.org/D40742 llvm-svn: 320913
* [SimplifyLibCalls] Inline calls to cabs when it's safe to do soHal Finkel2017-12-161-0/+33
| | | | | | | | | | | | When unsafe algerbra is allowed calls to cabs(r) can be replaced by: sqrt(creal(r)*creal(r) + cimag(r)*cimag(r)) Patch by Paul Walker, thanks! Differential Revision: https://reviews.llvm.org/D40069 llvm-svn: 320901
* [LV] NFC patch for moving VP*Recipe class definitions from LoopVectorize.cpp ↵Hal Finkel2017-12-163-392/+415
| | | | | | | | | | | | | | | | | | | to VPlan.h This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.*): 1. VP*Recipe classes in LoopVectorize.cpp are moved to VPlan.h. 2. Many of VP*Recipe::print() and execute() definitions are still left in LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To be moved to VPlan.cpp at a later time. 3. InterleaveGroup class is moved from anonymous namespace to llvm namespace. Referencing it in anonymous namespace from VPlan.h ended up in warning. Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41045 llvm-svn: 320900
* Fix NDEBUG build problem in r320895Teresa Johnson2017-12-161-1/+1
| | | | | | Fix incorrect placement of #endif causing NDEBUG build failures. llvm-svn: 320897
* [ThinLTO] Enable importing of aliases as copy of aliaseeTeresa Johnson2017-12-161-21/+97
| | | | | | | | | | | | | | | | | | | | | | | Summary: This implements a missing feature to allow importing of aliases, which was previously disabled because alias cannot be available_externally. We instead import an alias as a copy of its aliasee. Some additional work was required in the IndexBitcodeWriter for the distributed build case, to ensure that the aliasee has a value id in the distributed index file (i.e. even when it is not being imported directly). This is a performance win in codes that have many aliases, e.g. C++ applications that have many constructor and destructor aliases. Reviewers: pcc Subscribers: mehdi_amini, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D40747 llvm-svn: 320895
* Re-commit : [LICM] Allow sinking when foldable in loopJun Bum Lim2017-12-151-29/+79
| | | | | | | | | | | | | | | | | | | | | | This recommits r320823 reverted due to the test failure in sink-foldable.ll and an unused variable. Added "REQUIRES: aarch64-registered-target" in the test and removed unused variable. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320858
* [Memcpy Loop Lowering] Insert loop BB inbetween the split BB.Sean Fertile2017-12-151-2/+3
| | | | | | | | | | | The original memcpy expansion inserted the loop basic block inbetween the 2 new basic blocks created by splitting the original block the memcpy call was in. This commit makes the new memcpy expansion do the same to keep the layout of the IR matching between the old and new implementations. Differential Review: https://reviews.llvm.org/D41197 llvm-svn: 320848
* fix typo in comment and remove inaccurate comment; NFCSanjay Patel2017-12-151-2/+0
| | | | llvm-svn: 320838
* Revert "Re-commit : [LICM] Allow sinking when foldable in loop"Jun Bum Lim2017-12-151-78/+29
| | | | | | This reverts commit r320833. llvm-svn: 320836
* Re-commit : [LICM] Allow sinking when foldable in loopJun Bum Lim2017-12-151-29/+78
| | | | | | | | | | | | | | | | | | | | This recommit r320823 after fixing a test failure. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320833
* Revert "[LICM] Allow sinking when foldable in loop"Jun Bum Lim2017-12-151-78/+29
| | | | | | This reverts commit r320823. llvm-svn: 320828
* [LICM] Allow sinking when foldable in loopJun Bum Lim2017-12-151-29/+78
| | | | | | | | | | | | | | | | | Summary: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320823
* [PM] port Rewrite Statepoints For GC to the new pass manager.Fedor Sergeev2017-12-152-62/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The port is nearly straightforward. The only complication is related to the analyses handling, since one of the analyses used in this module pass is domtree, which is a function analysis. That requires asking for the results of each function and disallows a single interface for run-on-module pass action. Decided to copy-paste the main body of this pass. Most of its code is requesting analyses anyway, so not that much of a copy-paste. The rest of the code movement is to transform all the implementation helper functions like stripNonValidData into non-member statics. Extended all the related LLVM tests with new-pass-manager use. No failures. Reviewers: sanjoy, anna, reames Reviewed By: anna Subscribers: skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D41162 llvm-svn: 320796
* [SimplifyCFG] don't sink common insts too soon (PR34603)Sanjay Patel2017-12-143-5/+23
| | | | | | | | | | | | This should solve: https://bugs.llvm.org/show_bug.cgi?id=34603 ...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run. It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the sinking transform later in the optimization pipeline. Differential Revision: https://reviews.llvm.org/D38566 llvm-svn: 320749
* [SLPVectorizer] Don't ignore scalar extraction instructions of aggregate valueGuozhi Wei2017-12-141-3/+7
| | | | | | | | | In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted. For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation. Differential Revision: https://reviews.llvm.org/D41139 llvm-svn: 320736
* [PM][InstCombine] fixing omission of AliasAnalysis in new-pass-manager's ↵Fedor Sergeev2017-12-141-2/+3
| | | | | | | | | | | | | | | | | | version of InstCombine Summary: Passing AliasAnalysis results instead of nullptr appears to work just fine. A couple new-pass-manager tests updated to align with new order of analyses. Reviewers: chandlerc, spatel, craig.topper Reviewed By: chandlerc Subscribers: mehdi_amini, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D41203 llvm-svn: 320687
* [LV] Support efficient vectorization of an induction with redundant castsDorit Nuzman2017-12-142-14/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose update chain involves casts; PSCEV can now build an AddRecurrence for some forms of such phi nodes, under the proper runtime overflow test. This means that we can identify such phi nodes as an induction, and the loop-vectorizer can now vectorize such inductions, however inefficiently. The vectorizer doesn't know that it can ignore the casts, and so it vectorizes them. This patch records the casts in the InductionDescriptor, so that they could be marked to be ignored for cost calculation (we use VecValuesToIgnore for that) and ignored for vectorization/widening/scalarization (i.e. treated as TriviallyDead). In addition to marking all these casts to be ignored, we also need to make sure that each cast is mapped to the right vector value in the vector loop body (be it a widened, vectorized, or scalarized induction). So whenever an induction phi is mapped to a vector value (during vectorization/widening/ scalarization), we also map the respective cast instruction (if exists) to that vector value. (If the phi-update sequence of an induction involves more than one cast, then the above mapping to vector value is relevant only for the last cast of the sequence as we allow only the "last cast" to be used outside the induction update chain itself). This is the last step in addressing PR30654. llvm-svn: 320672
* [EarlyCSE] recognize swapped variants of abs/nabs as equivalentSanjay Patel2017-12-131-9/+12
| | | | | | | | Extends https://reviews.llvm.org/rL320640 Differential Revision: https://reviews.llvm.org/D41136 llvm-svn: 320653
* Reverting [JumpThreading] Preservation of DT and LVI across the passBrian M. Rzycki2017-12-134-311/+89
| | | | | | | Stage 2 bootstrap failed: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/14434 llvm-svn: 320641
* [EarlyCSE] recognize commuted and swapped variants of min/max as equivalent ↵Sanjay Patel2017-12-131-0/+27
| | | | | | | | | | | | | (PR35642) As shown in: https://bugs.llvm.org/show_bug.cgi?id=35642 ...we can have different forms of min/max, so we should recognize those here in EarlyCSE similar to how we already handle binops and compares that can commute. Differential Revision: https://reviews.llvm.org/D41136 llvm-svn: 320640
* Remove redundant includes from lib/Transforms.Michael Zolotukhin2017-12-1331-48/+0
| | | | llvm-svn: 320628
* [JumpThreading] Preservation of DT and LVI across the passBrian M. Rzycki2017-12-134-89/+311
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 320612
* [GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases loadAditya Kumar2017-12-131-2/+2
| | | | | | | | | | | | | | w.r.t. the paper "A Practical Improvement to the Partial Redundancy Elimination in SSA Form" (https://sites.google.com/site/jongsoopark/home/ssapre.pdf) Proper dominance check was missing here, so having a loopinfo should not be required. Committing this diff as this fixes the bug, if there are further concerns, I'll be happy to work on them. Differential Revision: https://reviews.llvm.org/D39781 llvm-svn: 320607
* Reintroduce r320049, r320014 and r319894.Igor Laevsky2017-12-131-0/+4
| | | | | | OpenGL issues should be fixed by now. llvm-svn: 320568
* [SLP] Vectorize jumbled memory loads.Mohammad Shahid2017-12-131-83/+195
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mgrang, dcaballe, hans, mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 320548
* [CallSiteSplitting] Refactor creating callsites.Florian Hahn2017-12-131-115/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change makes the call site creation more general if any of the arguments is predicated on a condition in the call site's predecessors. If we find a callsite, that potentially can be split, we collect the set of conditions for the call site's predecessors (currently only 2 predecessors are allowed). To do that, we traverse each predecessor's predecessors as long as it only has single predecessors and record the condition, if it is relevant to the call site. For each condition, we also check if the condition is taken or not. In case it is not taken, we record the inverse predicate. We use the recorded conditions to create the new call sites and split the basic block. This has 2 benefits: (1) it is slightly easier to see what is going on (IMO) and (2) we can easily extend it to handle more complex control flow. Reviewers: davidxl, junbuml Reviewed By: junbuml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40728 llvm-svn: 320547
* [hwasan] Inline instrumentation & fixed shadow.Evgeniy Stepanov2017-12-131-3/+48
| | | | | | | | | | | | Summary: This brings CPU overhead on bzip2 down from 5.5x to 2x. Reviewers: kcc, alekseyshl Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41137 llvm-svn: 320538
* [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.Alexey Bataev2017-12-121-4/+20
| | | | | | | | | | | | | | | | | Summary: If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320525
* Reassociate: add global reassociation algorithmFiona Glaser2017-12-121-2/+110
| | | | | | | | | | | | | | | | | | | | | | | This algorithm (explained more in the source code) takes into account global redundancies by building a "pair map" to find common subexprs. The primary motivation of this is to handle situations like foo = (a * b) * c bar = (a * d) * c where we currently don't identify that "a * c" is redundant. Accordingly, it prioritizes the emission of a * c so that CSE can remove the redundant calculation later. Does not change the actual reassociation algorithm -- only the order in which the reassociated operand chain is reconstructed. Gives ~1.5% floating point math instruction count reduction on a large offline suite of graphics shaders. llvm-svn: 320515
* Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load ↵Alexey Bataev2017-12-121-29/+11
| | | | | | | | bitcast." This reverts commit r320510 - again sanitizers bbots. llvm-svn: 320513
OpenPOWER on IntegriCloud