summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Analysis
Commit message (Collapse)AuthorAgeFilesLines
* [InstSimplify] fold funnel shifts with undef operandsSanjay Patel2018-11-201-1/+10
| | | | | | | | Splitting these off from the D54666. Patch by: nikic (Nikita Popov) llvm-svn: 347332
* [InstructionSimplify] Add support for saturating add/subSanjay Patel2018-11-201-0/+34
| | | | | | | | | | | | | | | | | | | | | | Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported: sat(X + 0) -> X sat(X + undef) -> -1 sat(X uadd MAX) -> MAX (and commutative variants) sat(X - 0) -> X sat(X - X) -> 0 sat(X - undef) -> 0 sat(undef - X) -> 0 sat(0 usub X) -> 0 sat(X usub MAX) -> 0 Patch by: @nikic (Nikita Popov) Differential Revision: https://reviews.llvm.org/D54532 llvm-svn: 347330
* [ConstantFolding] Add support for saturating add/subSanjay Patel2018-11-201-0/+12
| | | | | | | | | | Support saturating add/sub in constant folding, based on the APInt methods introduced in D54332. Patch by: @nikic (Nikita Popov) Differential Revision: https://reviews.llvm.org/D54531 llvm-svn: 347328
* [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlockVedant Kumar2018-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Add methods to BasicBlock which make it easier to efficiently check whether a block has N (or more) predecessors. This can be more efficient than using pred_size(), which is a linear time operation. We might consider adding similar methods for successors. I haven't done so in this patch because succ_size() is already O(1). With this patch applied, I measured a 0.065% compile-time reduction in user time for running `opt -O3` on the sqlite3 amalgamation (30 trials). The change in mergeStoreIntoSuccessor alone saves 45 million linked list iterations in a stage2 Release build of llc. See llvm.org/PR39702 for a harder but more general way of achieving similar results. Differential Revision: https://reviews.llvm.org/D54686 llvm-svn: 347256
* [LV] Avoid vectorizing unsafe dependencies in uniform addressAnna Thomas2018-11-191-4/+12
| | | | | | | | | | | | | | | | | | | Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220
* [LoopPass] fixing 'Modification' messages in -debug-pass=Executions for loop ↵Fedor Sergeev2018-11-191-2/+4
| | | | | | | | | | | | | passes Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass run, however condition for issuing it is accumulated among all the runs. That leads to confusing 'modification' messages as soon as the first modification is done. Changing condition to be "current pass made modifications", similar to how it is being done in all other pass managers. llvm-svn: 347215
* [ProfileSummary] Standardize methods and fix commentVedant Kumar2018-11-192-8/+8
| | | | | | | | | | | | | | | | | | | | | Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182
* Use llvm::copy. NFCFangrui Song2018-11-172-2/+2
| | | | llvm-svn: 347126
* [ThinLTO] Internalize readonly globalsEugene Leviant2018-11-161-17/+58
| | | | | | | | An attempt to recommit r346584 after failure on OSX build bot. Fixed cache key computation in ThinLTOCodeGenerator and added test case llvm-svn: 347033
* [InstSimplify] delete shift-of-zero guard ops around funnel shiftsSanjay Patel2018-11-151-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | This is a problem seen in common rotate idioms as noted in: https://bugs.llvm.org/show_bug.cgi?id=34924 Note that we are not canonicalizing standard IR (shifts and logic) to the intrinsics yet. (Although I've written this before...) I think this is the last step before we enable that transform. Ie, we could regress code by doing that transform without this simplification in place. In PR34924, I questioned whether this is a valid transform for target-independent IR, but I convinced myself this is ok. If we're speculating a funnel shift by turning cmp+br into select, then SimplifyCFG has already determined that the transform is justified. It's possible that SimplifyCFG is not taking into account profile or other metadata, but if that's true, then it's a bug independent of funnel shifts. Also, we do have CGP code to restore a guard like this around an intrinsic if it can't be lowered cheaply. But that isn't necessary for funnel shift because the default expansion in SelectionDAGBuilder includes this same cmp+select. Differential Revision: https://reviews.llvm.org/D54552 llvm-svn: 346960
* [ThinLTO] Update handling of vararg functions to match inlinerTeresa Johnson2018-11-141-2/+7
| | | | | | | | | | | | | | | | | Summary: Previously we marked all vararg functions as non-inlinable in the function summary, which prevented their importing. However, the corresponding inliner restriction was loosened in r321940/r342675 to only apply to functions calling va_start. Adjust the summary flag computation to match. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54270 llvm-svn: 346883
* [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue Simon Pilgrim2018-11-141-0/+7
| | | | llvm-svn: 346868
* [MemorySSA] Create query after checking if instruction is a fence.Alina Sbirlea2018-11-131-2/+3
| | | | | | | The alternative is checking if I is a fence in the Query constructor, so as to not attempt to get a non-existent MemoryLocation. llvm-svn: 346798
* Revert "[ThinLTO] Internalize readonly globals"Steven Wu2018-11-131-58/+17
| | | | | | This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2. llvm-svn: 346768
* [VectorUtils] Use namespace for InterleaveGroup template specialization.Florian Hahn2018-11-131-4/+6
| | | | llvm-svn: 346759
* [VPlan] VPlan version of InterleavedAccessInfo.Florian Hahn2018-11-131-10/+24
| | | | | | | | | | | | | | | | | This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D49489 llvm-svn: 346758
* [TTI] Make TargetTransformInfo::getOperandInfo static. NFCI.Simon Pilgrim2018-11-131-2/+1
| | | | | | It has no member dependencies and this makes it easier to reuse in other cost analysis code. llvm-svn: 346755
* [VectorUtils] add funnel-shifts to the list of vectorizable intrinsicsSanjay Patel2018-11-121-0/+2
| | | | | | | | | | | | | | | | This just identifies the intrinsics as candidates for vectorization. It does not mean we will attempt to vectorize under normal conditions (the test file is forcing vectorization). The cost model must be fixed to show that the transform is profitable in general. Allowing vectorization with these intrinsics is required to avoid potential regressions from canonicalizing to the intrinsics from generic IR: https://bugs.llvm.org/show_bug.cgi?id=37417 llvm-svn: 346661
* [VectorUtils] reorder list of vectorizable intrinsics; NFCSanjay Patel2018-11-121-10/+9
| | | | | | | We need to add funnel-shifts to this list, so clean up the random order before it gets worse. llvm-svn: 346660
* [LICM] Hoist guards from non-header blocksMax Kazantsev2018-11-122-0/+36
| | | | | | | | | | | This patch relaxes overconservative checks on whether or not we could write memory before we execute an instruction. This allows us to hoist guards out of loops even if they are not in the header block. Differential Revision: https://reviews.llvm.org/D50891 Reviewed By: fedor.sergeev llvm-svn: 346643
* [ThinLTO] Internalize readonly globalsEugene Leviant2018-11-101-17/+58
| | | | | | | | | This patch allows internalising globals if all accesses to them (from live functions) are from non-volatile load instructions Differential revision: https://reviews.llvm.org/D49362 llvm-svn: 346584
* [TTI] Flip vector types in getShuffleCost SK_ExtractSubvector callSimon Pilgrim2018-11-091-1/+1
| | | | | | | | For SK_ExtractSubvector, the default 'Ty' type is the source operand type and 'SubTy' is the destination subvector type I got this the wrong way around when I added rL346510 llvm-svn: 346534
* [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput ↵Simon Pilgrim2018-11-091-2/+8
| | | | | | | | (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. llvm-svn: 346510
* [SCEV][NFC] Verify IR in isLoop[Entry,Backedge]GuardedByCondMax Kazantsev2018-11-081-0/+15
| | | | | | | | | | | | | | | | | | We have a lot of various bugs that are caused by misuse of SCEV (in particular in LV), all of them can simply be described as "we ask SCEV to prove some fact on invalid IR". Some of examples of those are PR36311, PR37221, PR39160. The problem is that these failues manifest differently (what we saw was failure of various asserts across SCEV, but there can also be miscompiles). This patch adds an assert into two SCEV methods that strongly rely on correctness of the IR and are involved in known failues. This will at least allow us to have a clear indication of what was wrong in this case. This patch also fixes a unit test with incorrect IR that fails this verification. Differential Revision: https://reviews.llvm.org/D52930 Reviewed By: fhahn llvm-svn: 346389
* Allow subclassing ExternalAAMatt Arsenault2018-11-071-22/+0
| | | | | | | | | | | | | | This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353
* Add support for llvm.is.constant intrinsic (PR4898)James Y Knight2018-11-071-0/+22
| | | | | | | | | | | | | | | This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322
* Fix unit tests after patch https://reviews.llvm.org/rL346313Calixte Denizet2018-11-071-5/+1
| | | | | | | | | | | | | | Summary: Tests are broken so fix them. Reviewers: marco-c Reviewed By: marco-c Subscribers: sylvestre.ledru, llvm-commits Differential Revision: https://reviews.llvm.org/D54208 llvm-svn: 346318
* [GCOV] Flush counters before to avoid counting the execution before fork ↵Calixte Denizet2018-11-071-0/+24
| | | | | | | | | | | | | | | | | | | | twice and for exec** functions we must flush before the call Summary: This is replacement for patch in https://reviews.llvm.org/D49460. When we fork, the counters are duplicate as they're and so the values are finally wrong when writing gcda for parent and child. So just before to fork, we flush the counters and so the parent and the child have new counters set to zero. For exec** functions, we need to flush before the call to have some data. Reviewers: vsk, davidxl, marco-c Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru, marco-c Differential Revision: https://reviews.llvm.org/D53593 llvm-svn: 346313
* [ThinLTO] Split NotEligibleToImport into legality and inlinability flagsTeresa Johnson2018-11-061-10/+9
| | | | | | | | | | | | | | | | | | | | Summary: The NotEligibleToImport flag on the GlobalValueSummary was set if it isn't legal to import (e.g. because it references unpromotable locals) and when it can't be inlined (in which case importing is pointless). I split out the inlinable piece into a separate flag on the FunctionSummary (doesn't make sense for aliases or global variables), because in the future we may want to import for reasons other than inlining. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53345 llvm-svn: 346261
* Disable calls to *_finite and other glibc-only functions on Musl.Eli Friedman2018-11-061-5/+5
| | | | | | | | | Non-GNU environments don't have __finite_*, so treat them as unavailable. Differential Revision: https://reviews.llvm.org/D51282 llvm-svn: 346250
* [NFC] Turn collectTransitivePredecessors into a static functionMax Kazantsev2018-11-061-2/+5
| | | | llvm-svn: 346217
* [InstSimplify] fold select (fcmp X, Y), X, YSanjay Patel2018-11-051-0/+31
| | | | | | | | | This is NFCI for InstCombine because it calls InstSimplify, so I left the tests for this transform there. As noted in the code comment, we can allow this fold more often by using FMF and/or value tracking. llvm-svn: 346169
* [Inliner] Penalise inlining of calls with loops at OzDavid Green2018-11-051-0/+20
| | | | | | | | | | | | | | | | | | We currently seem to underestimate the size of functions with loops in them, both in terms of absolute code size and in the difficulties of dealing with such code. (Calls, for example, can be tail merged to further reduce codesize). At -Oz, we can then increase code size by inlining small loops multiple times. This attempts to penalise functions with loops at -Oz by adding a CallPenalty for each top level loop in the function. It uses LI (and hence DT) to calculate the number of loops. As we are dealing with minsize, the inline threshold is small and functions at this point should be relatively small, making the construction of these cheap. Differential Revision: https://reviews.llvm.org/D52716 llvm-svn: 346134
* [ValueTracking] determine sign of 0.0 from select when matching min/max FPSanjay Patel2018-11-041-0/+21
| | | | | | | | | | | | | | | | | | | In PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 ..we may fail to recognize/simplify fabs() in some cases because we do not canonicalize fcmp with a -0.0 operand. Adding that canonicalization can cause regressions on min/max FP tests, so that's this patch: for the purpose of determining whether something is min/max, let the value returned by the select determine how we treat a 0.0 operand in the fcmp. This patch doesn't actually change the -0.0 to +0.0. It just changes the analysis, so we don't fail to recognize equivalent min/max patterns that only differ in the signbit of 0.0. Differential Revision: https://reviews.llvm.org/D54001 llvm-svn: 346097
* [ValueTracking] peek through 2-input shuffles in ComputeNumSignBitsSanjay Patel2018-11-031-19/+34
| | | | | | | | | | | This patch gives the IR ComputeNumSignBits the same functionality as the DAG version (the code is derived from the existing code). This an extension of the single input shuffle analysis added with D53659. Differential Revision: https://reviews.llvm.org/D53987 llvm-svn: 346071
* [ProfileSummary] Add options to override hot and cold count thresholds.Easwaran Raman2018-11-021-0/+18
| | | | | | | | | | | | | | Summary: The hot and cold count thresholds are derived from the summary, but for debugging purposes it is convenient to provide the actual thresholds. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54040 llvm-svn: 346005
* [ValueTracking] allow non-canonical shuffles when computing signbitsSanjay Patel2018-11-021-8/+10
| | | | | | | This possibility is noted in D53987 for a different case, so we need to adjust the existing code. llvm-svn: 345988
* [AliasSetTracker] Misc cleanup (NFCI)Alina Sbirlea2018-11-011-14/+6
| | | | | Summary: Remove two redundant checks, add one in the unit test. Remove an unused method. Fix computation of TotalMayAliasSetSize. llvm-svn: 345911
* Fix clang -Wimplicit-fallthrough warnings across llvm, NFCReid Kleckner2018-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882
* [InstSimplify] fold icmp based on range of abs/nabs (2nd try)Sanjay Patel2018-11-011-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is retrying the fold from rL345717 (reverted at rL347780) ...with a fix for the miscompile demonstrated by PR39510: https://bugs.llvm.org/show_bug.cgi?id=39510 Original commit message: This is a fix for PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 We managed to get some of these patterns using computeKnownBits in https://reviews.llvm.org/D47041, but that can't be used for nabs(). Instead, put in some range-based logic, so we can fold both abs/nabs with icmp with a constant value. Alive proofs: https://rise4fun.com/Alive/21r Name: abs_nsw_is_positive %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp sgt i32 %abs, -1 => %r = i1 true Name: abs_nsw_is_not_negative %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp slt i32 %abs, 0 => %r = i1 false Name: nabs_is_negative_or_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp slt i32 %nabs, 1 => %r = i1 true Name: nabs_is_not_over_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp sgt i32 %nabs, 0 => %r = i1 false Differential Revision: https://reviews.llvm.org/D53844 llvm-svn: 345832
* [NFC] Specialize public API of ICFLoopSafetyInfo for insertions and removalsMax Kazantsev2018-11-011-1/+7
| | | | llvm-svn: 345822
* [SCEV] Avoid redundant computations when doing AddRec mergeMax Kazantsev2018-11-011-5/+6
| | | | | | | | | | | | | When we calculate a product of 2 AddRecs, we end up making quite massive computations to deduce the operands of resulting AddRec. This process can be optimized by computing all args of intermediate sum and then calling `getAddExpr` once rather than calling `getAddExpr` with intermediate result every time a new argument is computed. Differential Revision: https://reviews.llvm.org/D53189 Reviewed By: rtereshin llvm-svn: 345813
* revert rL345717 : [InstSimplify] fold icmp based on range of abs/nabsSanjay Patel2018-10-311-42/+0
| | | | | | | This can miscompile as shown in PR39510: https://bugs.llvm.org/show_bug.cgi?id=39510 llvm-svn: 345780
* [InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negativeSanjay Patel2018-10-311-1/+4
| | | | | | This is the inverted case for the transform added with D53874 / rL345725. llvm-svn: 345728
* [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negativeSanjay Patel2018-10-311-0/+4
| | | | | | | | | | | | | | This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725
* [InstSimplify] fold icmp based on range of abs/nabsSanjay Patel2018-10-311-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fix for PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 We managed to get some of these patterns using computeKnownBits in D47041, but that can't be used for nabs(). Instead, put in some range-based logic, so we can fold both abs/nabs with icmp with a constant value. Alive proofs: https://rise4fun.com/Alive/21r Name: abs_nsw_is_positive %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp sgt i32 %abs, -1 => %r = i1 true Name: abs_nsw_is_not_negative %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp slt i32 %abs, 0 => %r = i1 false Name: nabs_is_negative_or_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp slt i32 %nabs, 1 => %r = i1 true Name: nabs_is_not_over_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp sgt i32 %nabs, 0 => %r = i1 false Differential Revision: https://reviews.llvm.org/D53844 llvm-svn: 345717
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-312-5/+28
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* ADT/STLExtras: Introduce llvm::empty; NFCMatthias Braun2018-10-311-1/+1
| | | | | | | | This is modeled after C++17 std::empty(). Differential Revision: https://reviews.llvm.org/D53909 llvm-svn: 345679
* [AliasSetTracker] Cleanup addPointer interface. [NFCI]Alina Sbirlea2018-10-291-6/+6
| | | | | | | | | | | | | | Summary: Attempting to simplify the addPointer interface. Currently there's code decomposing a MemoryLocation into (Ptr, Size, AAMDNodes) only to recreate the MemoryLocation inside the call. Reviewers: reames, mkazantsev Subscribers: sanjoy, jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D53836 llvm-svn: 345548
* Revert r344172: [LV] Add a new reduction pattern matchRenato Golin2018-10-271-65/+6
| | | | | | | | This patch has caused fast-math issues in the reduction pattern. Will re-work and land again. llvm-svn: 345465
OpenPOWER on IntegriCloud