summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Fix PR24605.Sanjoy Das2015-08-282-0/+11
| | | | | | | | | | | | | | PR24605 is caused due to an incorrect insert point in instcombine's IR builder. When simplifying %t = add X Y ... %m = icmp ... %t the replacement for %t should be placed before %t, not before %m, as there could be a use of %t between %t and %m. llvm-svn: 246315
* Optimize memcmp(x,y,n)==0 for small n and suitably aligned x/y.Chad Rosier2015-08-281-0/+22
| | | | | | | http://reviews.llvm.org/D6952 PR20673 llvm-svn: 246313
* Remove Merge Functions pointer comparisonsJF Bastien2015-08-281-17/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch removes two remaining places where pointer value comparisons are used to order functions: comparing range annotation metadata, and comparing block address constants. (These are both rare cases, and so no actual non-determinism was observed from either case). The fix for range metadata is simple: the annotation always consists of a pair of integers, so we just order by those integers. The fix for block addresses is more subtle. Two constants are the same if they are the same basic block in the same function, or if they refer to corresponding basic blocks in each respective function. Note that in the first case, merging is trivially correct. In the second, the correctness of merging relies on the fact that the the values of block addresses cannot be compared. This change is actually an enhancement, as these functions could not previously be merged (see merge-block-address.ll). There is still a problem with cross function block addresses, in that constants pointing to a basic block in a merged function is not updated. This also more robustly compares floating point constants by all fields of their semantics, and fixes a dyn_cast/cast mixup. Author: jrkoenig Reviewers: dschuff, nlewycky, jfb Subscribers llvm-commits Differential revision: http://reviews.llvm.org/D12376 llvm-svn: 246305
* [SROA] Fix PR24463, a crash I introduced in SROA by allowing it toChandler Carruth2015-08-281-3/+13
| | | | | | | | | | | | handle more allocas with loads past the end of the alloca. I suspect there are some related crashers with slightly different patterns, but I'll fix those and add test cases as I find them. Thanks to David Majnemer for the excellent test case reduction here. Made this super simple to debug and fix. llvm-svn: 246289
* Revert r246244 and r246243Steven Wu2015-08-282-113/+11
| | | | | | These two commits cause clang/llvm bootstrap to hang. llvm-svn: 246279
* Constant propagation after hitting assume(cmp) bugfixPiotr Padlewski2015-08-282-9/+46
| | | | | | | | | Last time code run into assertion `BBE.isSingleEdge()` in lib/IR/Dominators.cpp:200. http://reviews.llvm.org/D12170 llvm-svn: 246244
* Constant propagation after hiting llvm.assumePiotr Padlewski2015-08-281-3/+68
| | | | | | | | | | | After hitting @llvm.assume(X) we can: - propagate equality that X == true - if X is icmp/fcmp (with eq operation), and one of operand is constant we can change all variables with constants in the same BasicBlock http://reviews.llvm.org/D11918 llvm-svn: 246243
* Improve vectorization diagnostic messages and extend vectorize(enable) pragma.Tyler Nowicki2015-08-271-15/+25
| | | | | | | | | | | | | | | | This patch changes the analysis diagnostics produced when loops with floating-point recurrences or memory operations are identified. The new messages say "cannot prove it is safe to reorder * operations; allow reordering by specifying #pragma clang loop vectorize(enable)". Depending on the type of diagnostic the message will include additional options such as ffast-math or __restrict__. This patch also allows the vectorize(enable) pragma to override the low pointer memory check threshold. When the hint is given a higher threshold is used. See the clang patch for the options produced for each diagnostic. llvm-svn: 246187
* [LoopVectorize] Add Support for Small Size Reductions.Chad Rosier2015-08-272-32/+214
| | | | | | | | | | | | | | | | | | | | | | | Unlike scalar operations, we can perform vector operations on element types that are smaller than the native integer types. We type-promote scalar operations if they are smaller than a native type (e.g., i8 arithmetic is promoted to i32 arithmetic on Arm targets). This patch detects and removes type-promotions within the reduction detection framework, enabling the vectorization of small size reductions. In the legality phase, we look through the ANDs and extensions that InstCombine creates during promotion, keeping track of the smaller type. In the profitability phase, we use the smaller type and ignore the ANDs and extensions in the cost model. Finally, in the code generation phase, we truncate the result of the reduction to allow InstCombine to rewrite the entire expression in the smaller type. This fixes PR21369. http://reviews.llvm.org/D12202 Patch by Matt Simpson <mssimpso@codeaurora.org>! llvm-svn: 246149
* [LoopVectorize] Extract InductionInfo into a helper class...James Molloy2015-08-273-134/+87
| | | | | | | | ... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup. NFC llvm-svn: 246145
* Whoops, remove trailing whitespace.Alex Rosenberg2015-08-271-1/+1
| | | | llvm-svn: 246141
* Allow value forwarding past release fences in EarlyCSEPhilip Reames2015-08-271-0/+11
| | | | | | | | | | | | A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence. We do need to make sure that stores before the fence can't be eliminated even if there's another store to the same location after the fence. In theory, we could reorder the second store above the fence and *then* eliminate the former, but we can't do this if the stores are on opposite sides of the fence. Note: While more aggressive then what's there, this patch is still implementing a really conservative ordering. In particular, I'm not trying to exploit undefined behavior via races, or the fact that the LangRef says only 'atomic' accesses are ordered w.r.t. fences. Differential Revision: http://reviews.llvm.org/D11434 llvm-svn: 246134
* [RewriteStatepointsForGC] Reduce the number of new instructions for base ↵Philip Reames2015-08-271-3/+57
| | | | | | | | | | | | | | pointers When computing base pointers, we introduce new instructions to propagate the base of existing instructions which might not be bases. However, the algorithm doesn't make any effort to recognize when the new instruction to be inserted is the same as an existing one already in the IR. Since this is happening immediately before rewriting, we don't really have a chance to fix it after the pass runs without teaching loop passes about statepoints. I'm really not thrilled with this patch. I've rewritten it 4 different ways now, but this is the best I've come up with. The case where the new instruction is just the original base defining value could be merged into the existing algorithm with some complexity. The problem is that we might have something like an extractelement from a phi of two vectors. It may be trivially obvious that the base of the 0th element is an existing instruction, but I can't see how to make the algorithm itself figure that out. Thus, I resort to the call to SimplifyInstruction instead. Note that we can only adjust the instructions we've inserted ourselves. The live sets are still being tracked in side structures at this point in the code. We can't easily muck with instructions which might be in them. Long term, I'm really thinking we need to materialize the live pointer sets explicitly in the IR somehow rather than using side structures to track them. Differential Revision: http://reviews.llvm.org/D12004 llvm-svn: 246133
* Improved printing of analysis diagnostics in the loop vectorizer.Tyler Nowicki2015-08-271-18/+26
| | | | | | | | This patch ensures that every analysis diagnostic produced by the vectorizer will be printed if the loop has a vectorization hint on it. The condition has also been improved to prevent printing when a disabling hint is specified. llvm-svn: 246132
* [SimplifyCFG] Prune code from a provably unreachable switch defaultPhilip Reames2015-08-261-0/+17
| | | | | | | | | | As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine. Note: Duplicate case values are disallowed by the LangRef and verifier. Differential Revision: http://reviews.llvm.org/D11995 llvm-svn: 246125
* Fix memory leak in sample profile pass.Diego Novillo2015-08-261-28/+23
| | | | | | | | | | | | | | | | The problem here were the function analyses invoked by the function pass manager from the new IPO pass. I looked at other IPO passes needing dominance information and the only one that requires it (partial inliner) does not use the standard dependency mechanism. This patch mimics what the partial inliner does to compute dominance, post-dominance and loop info. One thing I like about this approach is that I can delay the computation of all this until I actually need it. This should bring the ASAN buildbot back to green. If there's a better way to fix this, I'll do it in a follow-up patch. llvm-svn: 246066
* [SimplifyLibCalls] Fix a typoDavid Majnemer2015-08-261-1/+1
| | | | | | | cbrt(sqrt(x)) calculates the sixth root, not the ninth root. cbrt(cbrt(x)) calculates the ninth root. llvm-svn: 246046
* [SROA] Rip out all support for SSAUpdater in SROA.Chandler Carruth2015-08-261-198/+8
| | | | | | | | | | | | | This was only added to preserve the old ScalarRepl's use of SSAUpdater which was originally to avoid use of dominance frontiers. Now, we only need a domtree, and we'll need a domtree right after this pass as well and so it makes perfect sense to always and only use the dom-tree powered mem2reg. This was flag-flipper earlier and has stuck reasonably so I wanted to gut the now-dead code out of SROA before we waste more time with it. Among other things, this will make passmanager porting easier. llvm-svn: 246028
* Modernize with range-based for loops.Alex Rosenberg2015-08-261-21/+15
| | | | llvm-svn: 246018
* Reduce code duplication.Alex Rosenberg2015-08-261-12/+23
| | | | llvm-svn: 246017
* Trailing whitespaceAlex Rosenberg2015-08-261-1/+1
| | | | llvm-svn: 246016
* Comparing operands should not require the same ValueIDJF Bastien2015-08-261-5/+2
| | | | | | | | | | | | | Summary: When comparing basic blocks, there is an additional check that two Value*'s should have the same ID, which interferes with merging equivalent constants of different kinds (such as a ConstantInt and a ConstantPointerNull in the included testcase). The cmpValues function already ensures that the two values in each function are the same, so removing this check should not cause incorrect merging. Also, the type comparison is redundant, based on reviewing the code and testing on the test suite and several large LTO bitcodes. Author: jrkoenig Reviewers: nlewycky, jfb, dschuff Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D12302 llvm-svn: 246001
* Make variable argument intrinsics behave correctly in a Win64 CC function.Charles Davis2015-08-251-0/+4
| | | | | | | | | | | | | | | | Summary: This change makes the variable argument intrinsics, `llvm.va_start` and `llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch I have to add support for GCC's `__builtin_ms_va_list` constructs. Reviewers: nadav, asl, eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1622 llvm-svn: 245990
* [msan] Precise instrumentation for icmp sgt %x, -1.Evgeniy Stepanov2015-08-251-15/+20
| | | | | | | | | | Extend signed relational comparison instrumentation with a special case for comparisons with -1. This fixes an MSan false positive when such comparison is used as a sign bit test. https://llvm.org/bugs/show_bug.cgi?id=24561 llvm-svn: 245980
* Update libdeps in LLVMipo and LLVMScalarOpts, corresponding to r245940.NAKAMURA Takumi2015-08-252-2/+2
| | | | llvm-svn: 245957
* Fix dependencies/shared library buildMatthias Braun2015-08-251-1/+1
| | | | llvm-svn: 245955
* The patch replace the overflow check in loop vectorization with the minimum ↵Wei Mi2015-08-251-22/+25
| | | | | | | | | | | loop iterations check. The loop minimum iterations check below ensures the loop has enough trip count so the generated vector loop will likely be executed, and it covers the overflow check. Differential Revision: http://reviews.llvm.org/D12107. llvm-svn: 245952
* Convert SampleProfile pass into a Module pass.Diego Novillo2015-08-255-14/+31
| | | | | | | | | | | Eventually, we will need sample profiles to be incorporated into the inliner's cost models. To do this, we need the sample profile pass to be a module pass. This patch makes no functional changes beyond the mechanical adjustments needed to run SampleProfile as a module pass. llvm-svn: 245940
* Assume intrinsic handling in global optPiotr Padlewski2015-08-251-0/+4
| | | | | | | | | | | | | | It doesn't solve the problem, when for example we load something, and then assume that it is the same as some constant value, because globalopt will fail on unknown load instruction. The proposed solution would be to skip some instructions that we can't evaluate and they are safe to skip (f.e. load, assume and many others) and see if they are required to perform optimization (f.e. we don't care about ephemeral instructions that may appear using @llvm.assume()) http://reviews.llvm.org/D12266 llvm-svn: 245919
* fix typo; NFCSanjay Patel2015-08-241-1/+1
| | | | llvm-svn: 245869
* [sanitizers] Add DFSan support for AArch64 42-bit VMAAdhemerval Zanella2015-08-241-0/+14
| | | | | | | | | This patch adds support for dfsan on aarch64-linux with 42-bit VMA (current default config for 64K pagesize kernels). The support is enabled by defining the SANITIZER_AARCH64_VMA to 42 at build time for both clang/llvm and compiler-rt. The default VMA is 39 bits. llvm-svn: 245840
* Require Dominator Tree For SROA, improve compile-timeMehdi Amini2015-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TL-DR: SROA is followed by EarlyCSE which requires the DominatorTree. There is no reason not to require it up-front for SROA. Some history is necessary to understand why we ended-up here. r123437 switched the second (Legacy)SROA in the optimizer pipeline to use SSAUpdater in order to avoid recomputing the costly DominanceFrontier. The purpose was to speed-up the compile-time. Later r123609 removed the need for the DominanceFrontier in (Legacy)SROA. Right after, some cleanup was made in r123724 to remove any reference to the DominanceFrontier. SROA existed in two flavors: SROA_SSAUp and SROA_DT (the latter replacing SROA_DF). The second argument of `createScalarReplAggregatesPass` was renamed from `UseDomFrontier` to `UseDomTree`. I believe this is were a mistake was made. The pipeline was not updated and the call site was still: PM->add(createScalarReplAggregatesPass(-1, false)); At that time, SROA was immediately followed in the pipeline by EarlyCSE which required alread the DominatorTree. Not requiring the DominatorTree in SROA didn't save anything, but unfortunately it was lost at this point. When the new SROA Pass was introduced in r163965, I believe the goal was to have an exact replacement of the existing SROA, this bug slipped through. You can see currently: $ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure ... ... FunctionPass Manager SROA Dominator Tree Construction Early CSE After this patch: $ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure ... ... FunctionPass Manager Dominator Tree Construction SROA Early CSE This improves the compile time from 88s to 23s for PR17855. https://llvm.org/bugs/show_bug.cgi?id=17855 And from 113s to 12s for PR16756 https://llvm.org/bugs/show_bug.cgi?id=16756 Reviewers: chandlerc Differential Revision: http://reviews.llvm.org/D12267 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245820
* [WinEH] Require token linkage in EH pad/ret signaturesJoseph Tremoulet2015-08-232-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: WinEHPrepare is going to require that cleanuppad and catchpad produce values of token type which are consumed by any cleanupret or catchret exiting the pad. This change updates the signatures of those operators to require/enforce that the type produced by the pads is token type and that the rets have an appropriate argument. The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and similarly for `CleanupReturnInst`/`CleanupPadInst`). To accommodate that restriction, this change adds a notion of an operator constraint to both LLParser and BitcodeReader, allowing appropriate sentinels to be constructed for forward references and appropriate error messages to be emitted for illegal inputs. Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad predecessor must have no other predecessors; this ensures that WinEHPrepare will see the expected linear relationship between sibling catches on the same try. Lastly, remove some superfluous/vestigial casts from instruction operand setters operating on BasicBlocks. Reviewers: rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12108 llvm-svn: 245797
* Improve the determinism of MergeFunctionsJF Bastien2015-08-211-45/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Merge functions previously relied on unsigned comparisons of pointer values to order functions. This caused observable non-determinism in the compiler for large bitcode programs. Basically, opt -mergefuncs program.bc | md5sum produces different hashes when run repeatedly on the same machine. Differing output was observed on three large bitcodes, but it was less frequent on the smallest file. It is possible that this only manifests on the large inputs, hence remaining undetected until now. This patch fixes this by removing (almost, see below) all places where comparisons between pointers are used to order functions. Most of these changes are local, but the comparison of global values requires assigning an identifier to each local in the order it is visited. This is very similar to the way the comparison function identifies Value*'s defined within a function. Because the order of visiting the functions and their subparts is deterministic, the identifiers assigned to the globals will be as well, and the order of functions will be deterministic. With these changes, there is no more observed non-determinism. There is also only minor slowdowns (negligible to 4%) compared to the baseline, which is likely a result of the fact that global comparisons involve hash lookups and not just pointer comparisons. The one caveat so far is that programs containing BlockAddress constants can still be non-deterministic. It is not clear what the right solution is here. In particular, even if the global numbers are used to order by function, we still need a way to order the BasicBlock*'s. Unfortunately, we cannot just bail out and fail to order the functions or consider them equal, because we require a total order over functions. Note that programs with BlockAddress constants are relatively rare, so the impact of leaving this in is minor as long as this pass is opt-in. Author: jrkoenig Reviewers: nlewycky, jfb, dschuff Subscribers: jevinskie, llvm-commits, chapuni Differential revision: http://reviews.llvm.org/D12168 llvm-svn: 245762
* Standardized 'failed' to 'Failed' in LoopVectorizationRequirements.Tyler Nowicki2015-08-211-4/+4
| | | | llvm-svn: 245759
* Re-apply r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"Sanjoy Das2015-08-211-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | The original checkin was buggy, this change has a fix. Original commit message: [InstCombine] Transform A & (L - 1) u< L --> L != 0 Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245753
* [opaque pointer type]: Pass explicit pointee type when building a constant GEP.David Blaikie2015-08-211-2/+7
| | | | | | | | | | Gets a bit tricky in the ValueMapper, of course - not sure if we should just expose a list of explicit types for each Value so that the ValueMapper can be neutral to these special cases (it's OK for things like load, where the explicit type is the result type - but when that's not the case, it means plumbing through another "special" type... ) llvm-svn: 245728
* Revert r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"NAKAMURA Takumi2015-08-211-13/+0
| | | | | | It caused miscompilation in clang. llvm-svn: 245678
* TransformUtils: Introduce module splitter.Peter Collingbourne2015-08-213-0/+125
| | | | | | | | | | | | | The module splitter splits a module into linkable partitions. It will be used to implement parallel LTO code generation. This initial version of the splitter does not attempt to deal with the somewhat subtle symbol visibility issues around module splitting. These will be dealt with in a future change. Differential Revision: http://reviews.llvm.org/D12132 llvm-svn: 245662
* [InstCombine] Transform A & (L - 1) u< L --> L != 0Sanjoy Das2015-08-201-0/+13
| | | | | | | | | | | | | | | | | | | | Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245635
* [SLP] Propagate 'nontemporal' attribute into vectorized instructions.Michael Zolotukhin2015-08-201-0/+3
| | | | llvm-svn: 245633
* [LoopVectorize] Propagate 'nontemporal' attribute into vectorized instructions.Michael Zolotukhin2015-08-201-1/+2
| | | | llvm-svn: 245632
* Rename Instruction::dropUnknownMetadata() to dropUnknownNonDebugMetadata()Adrian Prantl2015-08-207-9/+4
| | | | | | | | | | | and make it always preserve debug locations, since all callers wanted this behavior anyway. This is addressing a post-commit review feedback for r245589. NFC (inside the LLVM tree). llvm-svn: 245622
* [asan] Add ASAN support for AArch64 42-bit VMAAdhemerval Zanella2015-08-201-0/+14
| | | | | | | | | This patch adds support for asan on aarch64-linux with 42-bit VMA (current default config for 64K pagesize kernels). The support is enabled by defining the SANITIZER_AARCH64_VMA to 42 at build time for both clang/llvm and compiler-rt. The default VMA is 39 bits. llvm-svn: 245594
* [ValueTracking] computeOverflowForSignedAdd and isKnownNonNegativeJingyue Wu2015-08-201-29/+6
| | | | | | | | | | | | | | | | Summary: Refactor, NFC Extracts computeOverflowForSignedAdd and isKnownNonNegative from NaryReassociate to ValueTracking in case others need it. Reviewers: reames Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D11313 llvm-svn: 245591
* Fix a bug that caused SimplifyCFG to drop DebugLocs.Adrian Prantl2015-08-206-2/+7
| | | | | | | | | | | Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all metadata in KnownSet, but the condition for DebugLocs was inverted. Most users of dropUnknownMetadata() actually worked around this by not adding LLVMContext::MD_dbg to their list of KnowIDs. This is now made explicit. llvm-svn: 245589
* Fix a debug location handling bug in GVN.Adrian Prantl2015-08-201-1/+2
| | | | | | | | | | | | | | Caught by the famous "DebugLoc describes the currect SubProgram" assertion. When GVN is removing a nonlocal load it updates the debug location of the SSA value it replaced the load with with the one of the load. In the testcase this actually overwrites a valid debug location with an empty one. In reality GVN has to make an arbitrary choice between two equally valid debug locations. This patch changes to behavior to only update the location if the value doesn't already have a debug location. llvm-svn: 245588
* [LVer] Fix FIXME: hide addPHINodes, NFCAdam Nemet2015-08-202-3/+7
| | | | | | | | | | | | | | Since Ashutosh made findDefsUsedOutsideOfLoop public, we can clean this up. Now clients that don't compute DefsUsedOutsideOfLoop can just call versionLoop() and computing DefsUsedOutsideOfLoop will happen implicitly. With that there is no reason to expose addPHINodes anymore. Ashutosh, you can now drop the calls to findDefsUsedOutsideOfLoop and addPHINodes in LVerLICM and things should just work. llvm-svn: 245579
* Optimize bitwise even/odd test (-x&1 -> x&1) to not use negation.Balaram Makam2015-08-201-0/+4
| | | | | | | | | | | | Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd. Reviewers: majnemer Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D12156 llvm-svn: 245569
* Make helper functions static. NFC.Benjamin Kramer2015-08-201-1/+1
| | | | llvm-svn: 245549
OpenPOWER on IntegriCloud