summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [Coroutines] Part13: Handle single edge PHINodes across suspendsGor Nishanov2016-09-093-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If one of the uses of the value is a single edge PHINode, handle it. Original: %val = something <suspend> %p = PHINode [%val] After Spill + Part13: %val = something %slot = gep val.spill.slot store %val, %slot <suspend> %p = load %slot Plus tiny fixes/changes: * use correct index for coro.free in CoroCleanup * fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24242 llvm-svn: 281020
* Remove debug info when hoisting instruction from then/else branch.Dehao Chen2016-09-081-0/+8
| | | | | | | | | | | | Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch. Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D24164 llvm-svn: 280995
* [LV] Ensure proper handling of multi-use case when collecting uniformsMatthew Simpson2016-09-081-5/+5
| | | | | | | | | | | The test case included in r280979 wasn't checking what it was supposed to be checking for the predicated store case. Fixing the test revealed that the multi-use case (when a pointer is used by both vectorized and scalarized memory accesses) wasn't being handled properly. We can't skip over non-consecutive-like pointers since they may have looked consecutive-like with a different memory access. llvm-svn: 280992
* [LV] Don't mark pointers used by scalarized memory accesses uniformMatthew Simpson2016-09-081-42/+143
| | | | | | | | | | | | | | | | | | Previously, all consecutive pointers were marked uniform after vectorization. However, if a consecutive pointer is used by a memory access that is eventually scalarized, the pointer won't remain uniform after all. An example is predicated stores. Even though a predicated store may be consecutive, it will still be scalarized, making it's pointer operand non-uniform. This patch updates the logic in collectLoopUniforms to consider the cases where a memory access may be scalarized. If a memory access may be scalarized, its pointer operand is not marked uniform. The determination of whether a given memory instruction will be scalarized or not has been moved into a common function that is used by the vectorizer, cost model, and legality analysis. Differential Revision: https://reviews.llvm.org/D24271 llvm-svn: 280979
* [LoopDataPrefetch] Use range based for loop; NFCIBalaram Makam2016-09-081-17/+12
| | | | | | | Switch to range based for loop. No functional change, but more readable code. llvm-svn: 280966
* [InstCombine] return a vector-safe true/false constantSanjay Patel2016-09-081-2/+2
| | | | | | | | | | | I introduced this potential bug by missing this diff in: https://reviews.llvm.org/rL280873 ...however, I'm not sure how to reach this code path with a regression test. We may be able to remove this code and assume that the transform to a constant is always handled by InstSimplify? llvm-svn: 280964
* revert r280427Dehao Chen2016-09-082-6/+4
| | | | | | | Refactor replaceDominatedUsesWith to have a flag to control whether to replace uses in BB itself. Summary: This is in preparation for LoopSink pass which calls replaceDominatedUsesWith to update after sinking. llvm-svn: 280949
* [asan] Avoid lifetime analysis for allocas with can be in ambiguous stateVitaly Buka2016-09-081-0/+75
| | | | | | | | | | | | | | | | | | Summary: C allows to jump over variables declaration so lifetime.start can be avoid before variable usage. To avoid false-positives on such rare cases we detect them and remove from lifetime analysis. PR27453 PR28267 Reviewers: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24321 llvm-svn: 280907
* Revert "[LoopUnroll] Properly update loop-info when cloning prologues and ↵Michael Zolotukhin2016-09-081-54/+11
| | | | | | | | | | epilogues." This reverts commit r280901. This caused a bunch of failures, reverting it until I investigate them. llvm-svn: 280905
* [LoopUnroll] Properly update loop-info when cloning prologues and epilogues.Michael Zolotukhin2016-09-081-11/+54
| | | | | | | | | | | | | | | | | | | Summary: When cloning blocks for prologue/epilogue we need to replicate the loop structure from the original loop. It wasn't a problem for the innermost loops, but it led to an incorrect loop info when we unrolled a loop with a child loop - in this case created prologue-loop had a child loop, but loop info didn't reflect that. This fixes PR28888. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits, silvas Differential Revision: https://reviews.llvm.org/D24203 llvm-svn: 280901
* IR: Remove Value::intersectOptionalDataWith, replace all calls with calls to ↵Peter Collingbourne2016-09-074-6/+6
| | | | | | | | | | Instruction::andIRFlags. The two functions are functionally equivalent. Differential Revision: https://reviews.llvm.org/D22830 llvm-svn: 280884
* Revert "[asan] Avoid lifetime analysis for allocas with can be in ambiguous ↵Vitaly Buka2016-09-071-74/+0
| | | | | | | | | | state" Fails on Windows. This reverts commit r280880. llvm-svn: 280883
* [asan] Avoid lifetime analysis for allocas with can be in ambiguous stateVitaly Buka2016-09-071-0/+74
| | | | | | | | | | | | | | | | | | Summary: C allows to jump over variables declaration so lifetime.start can be avoid before variable usage. To avoid false-positives on such rare cases we detect them and remove from lifetime analysis. PR27453 PR28267 Reviewers: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24321 llvm-svn: 280880
* [InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for ↵Sanjay Patel2016-09-072-52/+22
| | | | | | splat constant vectors llvm-svn: 280873
* [SimplifyCFG] Don't try to create metadata-valued PHIsHal Finkel2016-09-071-0/+4
| | | | | | | | | | | | | | | | We can't create metadata-valued PHIs; don't try to do so when sinking. I created a test case for this using the @llvm.type.test intrinsic, because it takes a metadata parameter and does not have severe side effects (thus SimplifyCFG is willing to otherwise sink it). Previously, running the test case would crash with: Invalid use of metadata! %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0> LLVM ERROR: Broken function found, compilation aborted! llvm-svn: 280866
* [LoopUnroll] Correct a debug message. NFC.Haicheng Wu2016-09-071-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D24299 llvm-svn: 280865
* [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectorsSanjay Patel2016-09-071-43/+33
| | | | | | | | This is a revert of r280676 which was a revert of r280637; ie, this is r280637 again. It was speculatively reverted to help debug buildbot failures. llvm-svn: 280861
* Typo. NFC.Chad Rosier2016-09-071-1/+1
| | | | llvm-svn: 280834
* [LoopInterchange] Improve debug output. NFC.Chad Rosier2016-09-071-6/+6
| | | | llvm-svn: 280820
* [LoopInterchange] Improve debug output. NFC.Chad Rosier2016-09-071-4/+6
| | | | llvm-svn: 280819
* [LSV] Use the original loads' names for the extractelement instructions.Justin Lebar2016-09-071-2/+4
| | | | | | | | | | | | | | | | Summary: LSV replaces multiple adjacent loads with one vectorized load and a bunch of extractelement instructions. This patch makes the extractelement instructions' names match those of the original loads, for (hopefully) improved readability. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D23748 llvm-svn: 280818
* [InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining ↵Andrea Di Biagio2016-09-071-3/+3
| | | | | | | | | | | logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807
* [InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the ↵Andrea Di Biagio2016-09-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804
* Revert "[EfficiencySanitizer] Adds shadow memory parameters for 40-bit ↵Renato Golin2016-09-071-34/+9
| | | | | | | | | | | virtual memory address." This reverts commit r280796, as it broke the AArch64 bots for no reason. The tests were passing and we should try to keep them passing, so a proper review should make that happen. llvm-svn: 280802
* [EfficiencySanitizer] Adds shadow memory parameters for 40-bit virtual ↵Sagar Thakur2016-09-071-9/+34
| | | | | | | | | | | memory address. Adding 40-bit shadow memory parameters because MIPS64 uses 40-bit virtual memory addresses. Reviewed by bruening Differential: D23801 llvm-svn: 280796
* [SimplifyCFG] Followup fix to r280790James Molloy2016-09-071-1/+3
| | | | | | In failure cases it's not guaranteed that the PHI we're inspecting is actually in the successor block! In this case we need to bail out early, and never query getIncomingValueForBlock() as that will cause an assert. llvm-svn: 280794
* [SimplifyCFG] Update workaround for PR30188 to also include loadsJames Molloy2016-09-071-2/+7
| | | | | | | | I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores. Fixes PR30188 (again). llvm-svn: 280792
* [SimplifyCFG] Check PHI uses more accuratelyJames Molloy2016-09-071-1/+3
| | | | | | | | PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG. Fixes PR30292. llvm-svn: 280790
* Fix typo in comment, NFCNick Lewycky2016-09-071-1/+1
| | | | llvm-svn: 280774
* Explicitly require DominatorTreeAnalysis pass for instsimplify pass.Dehao Chen2016-09-061-5/+6
| | | | | | | | | | | | Summary: DominatorTreeAnalysis is always required by instsimplify. Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24173 llvm-svn: 280760
* fix formatting; NFCSanjay Patel2016-09-061-19/+14
| | | | llvm-svn: 280727
* [JumpThreading] Only write back branch-weight MDs for blocks that originally ↵Adam Nemet2016-09-061-1/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | had PGO info Currently the pass updates branch weights in the IR if the function has any PGO info (entry frequency is set). However we could still have regions of the CFG that does not have branch weights collected (e.g. a cold region). In this case we'd use static estimates. Since static estimates for branches are determined independently, they are inconsistent. Updating them can "randomly" inflate block frequencies. I've run into this in a completely cold loop of h264ref from SPEC. -Rpass-with-hotness showed the loop to be completely cold during inlining (before JT) but completely hot during vectorization (after JT). The new testcase demonstrate the problem. We check array elements against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting check. The block names should be self-explanatory. In this example, jump threading incorrectly updates the weight of the loop-exiting branch to 0, drastically inflating the frequency of the loop (in the range of billions). There is no run-time profile info for edges inside the loop, so branch probabilities are estimated. These are the resulting branch and block frequencies for the loop body: check_1 (16) (8) / | eq_1 | (8) \ | check_2 (16) (8) / | eq_2 | (8) \ | check_3 (16) (1) / | (loop exit) | (15) | (back edge) First we thread eq_1 -> check_2 to check_3. Frequencies are updated to remove the frequency of eq_1 from check_2 and then from the false edge leaving check_2. Changed frequencies are highlighted with * *: check_1 (16) (8) / | eq_1~ | (8) / | / check_2 (*8*) / (8) / | \ eq_2 | (*0*) \ \ | ` --- check_3 (16) (1) / | (loop exit) | (15) | (back edge) Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new back edges. Frequencies are updated to remove the frequency of eq_1 and eq_3 from check_3 and then the false edge leaving check_3 (changed frequencies are highlighted with * *): check_1 (16) (8) / | eq_1~ | (8) / | / check_2 (*8*) / (8) / | /-- eq_2~ | (*0*) (back edge) | check_3 (*0*) (*0*) / | (loop exit) | (*0*) | (back edge) As a result, the loop exit edge ends up with 0 frequency which in turn makes the loop header to have maximum frequency. There are a few potential problems here: 1. The profile data seems odd. There is a single profile sample of the loop being entered. On the other hand, there are no weights inside the loop. 2. Based on static estimation we shouldn't set edges to "extreme" values, i.e. extremely likely or unlikely. 3. We shouldn't create profile metadata that is calculated from static estimation. I am not sure what policy is but it seems to make sense to treat profile metadata as something that is known to originate from profiling. Estimated probabilities should only be reflected in BPI/BFI. Any one of these would probably fix the immediate problem. I went for 3 because I think it's a good policy to have and added a FIXME about 2. Differential Revision: https://reviews.llvm.org/D24118 llvm-svn: 280713
* [Coroutines] Part12: Handle alloca address-takenGor Nishanov2016-09-051-1/+46
| | | | | | | | | | | | | | | | | | | | | | | Summary: Move early uses of spilled variables after CoroBegin. For example, if a parameter had address taken, we may end up with the code like: define @f(i32 %n) { %n.addr = alloca i32 store %n, %n.addr ... call @coro.begin This patch fixes the problem by moving uses of spilled variables after CoroBegin. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24234 llvm-svn: 280678
* [InstCombine] don't assert that division-by-constant has been folded (PR30281)Sanjay Patel2016-09-051-7/+6
| | | | | | | | | | This is effectively a revert of: https://reviews.llvm.org/rL280115 And this should fix https://llvm.org/bugs/show_bug.cgi?id=30281: llvm-svn: 280677
* [InstCombine] revert r280637 because it causes test failures on an ARM botSanjay Patel2016-09-051-33/+43
| | | | | | http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll llvm-svn: 280676
* [Coroutines] Part11: Add final suspend handling.Gor Nishanov2016-09-053-17/+93
| | | | | | | | | | | | | | | | | | | Summary: A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties: * it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic; * a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic. This patch adds final suspend handling logic to CoroEarly and CoroSplit passes. Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll). Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24068 llvm-svn: 280646
* [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectorsSanjay Patel2016-09-041-43/+33
| | | | | | | | The code to calculate 'UsesRemoved' could be simplified. As-is, that code is a victim of PR30273: https://llvm.org/bugs/show_bug.cgi?id=30273 llvm-svn: 280637
* [InstCombine] recode icmp fold in a vector-friendly way; NFCSanjay Patel2016-09-041-22/+30
| | | | | | | | | | | The transform in question: icmp (and (trunc W), C2), C1 -> icmp (and W, C2'), C1' ...is still not enabled for vectors, thus no functional change intended. It's not clear to me if this is a good transform for vectors or even scalars in general. Changing that behavior may be a follow-on patch. llvm-svn: 280627
* [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacingDorit Nuzman2016-09-041-0/+6
| | | | | | | | | | | | memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617
* Test commit.Dorit Nuzman2016-09-041-0/+1
| | | | llvm-svn: 280615
* Fix inliner funclet unwind memoizationJoseph Tremoulet2016-09-041-7/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The inliner may need to determine where a given funclet unwinds to, and this determination may depend on other funclets throughout the funclet tree. The code that performs this walk in getUnwindDestToken memoizes results to avoid redundant computations. In the case that a funclet's unwind destination is derived from its ancestor, there's code to walk back down the tree from the ancestor updating the memo map of its descendants to record the unwind destination. This change fixes that code to account for the case that some descendant has a different unwind destination, which can happen if that unwind dest is a descendant of the EHPad being queried and thus didn't determine its unwind destination. Also update test inline-funclets.ll, which is supposed to cover such scenarios, to include a case that fails an assertion without this fix but passes with it. Fixes PR29151. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24117 llvm-svn: 280610
* Cleanup : Use metadata preserving API for branch creationXinliang David Li2016-09-031-9/+4
| | | | | | | Use the wrapper API in IRBuilder that does meta data copy to create new branch in LoopUnswitch. llvm-svn: 280602
* AMDGPU: Do basic folding of class intrinsicMatt Arsenault2016-09-031-0/+79
| | | | | | | This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586
* ADT: Do not inherit from std::iterator in ilist_iteratorDuncan P. N. Exon Smith2016-09-031-1/+1
| | | | | | | | | | | | | Inheriting from std::iterator uses more boiler-plate than manual typedefs. Avoid that in both ilist_iterator and MachineInstrBundleIterator. This has the side effect of removing ilist_iterator from certain ADL lookups in namespace std; calls to std::next need to be qualified by "std::" that didn't have to before. The one case of this in-tree was operating on a temporary, so I used the more compact operator++. llvm-svn: 280570
* [Profile] handle select instruction in 'expect' loweringXinliang David Li2016-09-021-11/+25
| | | | | | | | | Builtin expect lowering currently ignores select. This patch fixes the issue Differential Revision: http://reviews.llvm.org/D24166 llvm-svn: 280547
* [SLP] Don't pass a global CL option as an argument. NFC.Chad Rosier2016-09-021-8/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D24199 llvm-svn: 280527
* [InsttCombine] fold insertelement of constant into shuffle with constant ↵Sanjay Patel2016-09-021-0/+76
| | | | | | | | | | | | | | | | | operand (PR29126) The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. Note that the transform is intentionally limited to shuffles that are equivalent to vector selects to avoid creating arbitrary shuffle masks that may not lower well. This should solve PR29126: https://llvm.org/bugs/show_bug.cgi?id=29126 Differential Revision: https://reviews.llvm.org/D23886 llvm-svn: 280504
* [LV] Ensure reverse interleaved group GEPs remain uniformMatthew Simpson2016-09-021-1/+11
| | | | | | | | | | | | | | For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497
* [SimplifyCFG] Add a workaround to fix PR30188James Molloy2016-09-021-0/+10
| | | | | | | | We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280470
* revert r280429 and r280425:Dehao Chen2016-09-021-22/+24
| | | | | | | | | | | | | | | r280425 | dehao | 2016-09-01 16:15:50 -0700 (Thu, 01 Sep 2016) | 9 lines Refactor LICM pass in preparation for LoopSink pass. Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778). r280429 | dehao | 2016-09-01 16:31:25 -0700 (Thu, 01 Sep 2016) | 9 lines Refactor LICM to expose canSinkOrHoistInst to LoopSink pass. Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778 llvm-svn: 280453
OpenPOWER on IntegriCloud