summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Revert 230275.Sanjoy Das2015-02-234-5/+5
| | | | | | | | 230275 got committed with an incorrect commit message due to a mixup on my side. Will re-land in a few moments with the correct commit message. llvm-svn: 230279
* Fix bug 22641Sanjoy Das2015-02-234-5/+5
| | | | | | | | | | | The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. Differential Revision: http://reviews.llvm.org/D7808 llvm-svn: 230275
* Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.Chad Rosier2015-02-232-0/+77
| | | | | | | | | | | This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a fmul from being hoisted in cases where it is more profitable to form a fmsub/fmadd. Phabricator Review: http://reviews.llvm.org/D7299 Patch by Lawrence Hu <lawrence@codeaurora.org> llvm-svn: 230241
* InstSimplify: simplify 0 / X if nnan and nszMehdi Amini2015-02-231-0/+9
| | | | | From: Fiona Glaser <fglaser@apple.com> llvm-svn: 230238
* IRCE: use SCEVs instead of llvm::Value's for intermediateSanjoy Das2015-02-213-14/+24
| | | | | | | | | calculations. Semantically non-functional change. This gets rid of some of the SCEV -> Value -> SCEV round tripping and the Construct(SMin|SMax)Of and MaybeSimplify helper routines. llvm-svn: 230150
* [PlaceSafepoints] Adjust enablement logic to default to off and be GC ↵Philip Reames2015-02-215-9/+17
| | | | | | | | configurable per GC Previously, this pass ran over every function in the Module if added to the pass order. With this change, it runs only over those with a GC attribute where the GC explicitly opts in. A GC can also choose which of entry safepoint polls, backedge safepoint polls, and call safepoints it wants. I hope to get these exposed as checks on the GCStrategy at some point, but for now, the checks are manual string comparisons. llvm-svn: 230097
* LoopRotate: When reconstructing loop simplify form don't split edges from ↵Benjamin Kramer2015-02-201-0/+18
| | | | | | | | | | | | indirectbrs. Yet another chapter in the endless story. While this looks like we leave the loop in a non-canonical state this replicates the logic in LoopSimplify so it doesn't diverge from the canonical form in any way. PR21968 llvm-svn: 230058
* Introduce bitset metadata format and bitset lowering pass.Peter Collingbourne2015-02-203-0/+201
| | | | | | | | | | | | | | | | | | | | This patch introduces a new mechanism that allows IR modules to co-operatively build pointer sets corresponding to addresses within a given set of globals. One particular use case for this is to allow a C++ program to efficiently verify (at each call site) that a vtable pointer is in the set of valid vtable pointers for the class or its derived classes. One way of doing this is for a toolchain component to build, for each class, a bit set that maps to the memory region allocated for the vtables, such that each 1 bit in the bit set maps to a valid vtable for that class, and lay out the vtables next to each other, to minimize the total size of the bit sets. The patch introduces a metadata format for representing pointer sets, an '@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals and builds the bitsets, and documents the new feature. Differential Revision: http://reviews.llvm.org/D7288 llvm-svn: 230054
* Bugfix for 229954Philip Reames2015-02-201-0/+11
| | | | | | Before calling Function::getGC to test for enablement, we need to make sure there's actually a GC at all via Function::hasGC. Otherwise, we'd crash on functions without a GC. Thankfully, this only mattered if you manually scheduled the pass, but still, oops. :( llvm-svn: 230040
* [InstCombine] Remove unnecessary variable indexing into single-element arraysHal Finkel2015-02-202-5/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change addresses a deficiency pointed out in PR22629. To copy from the bug report: [from the bug report] Consider this code: int f(int x) { int a[] = {12}; return a[x]; } GCC knows to optimize this to movl $12, %eax ret The code generated by recent Clang at -O3 is: movslq %edi, %rax movl .L_ZZ1fiE1a(,%rax,4), %eax retq .L_ZZ1fiE1a: .long 12 # 0xc [end from the bug report] This definitely seems worth fixing. I've also seen this kind of code before (as the base case of generic vector wrapper templates with one element). The general idea is to look at the GEP feeding a load or a store, which has some variable as its first non-zero index, and determine if that index must be zero (or else an out-of-bounds access would occur). We can do this for allocas and globals with constant initializers where we know the maximum size of the underlying object. When we find such a GEP, we create a new one for the memory access with that first variable index replaced with a constant zero. Even if we can't eliminate the memory access (and sometimes we can't), it is still useful because it removes unnecessary indexing calculations. llvm-svn: 229959
* Adjust enablement of RewriteStatepointsForGCPhilip Reames2015-02-201-4/+4
| | | | | | | | When back merging the changes in 229945 I noticed that I forgot to mark the test cases with the appropriate GC. We want the rewriting to be off by default (even when manually added to the pass order), not on-by default. To keep the current test working, mark them as using the statepoint-example GC and whitelist that GC. Longer term, we need a better selection mechanism here for both actual usage and testing. As I migrate more tests to the in tree version of this pass, I will probably need to update the enable/disable logic as well. llvm-svn: 229954
* Add a pass for constructing gc.statepoint sequences w/explicit relocationsPhilip Reames2015-02-201-0/+77
| | | | | | | | | | | | | This patch consists of a single pass whose only purpose is to visit previous inserted gc.statepoints which do not have gc.relocates inserted yet, and insert them. This can be used either immediately after IR generation to perform 'early safepoint insertion' or late in the pass order to perform 'late insertion'. This patch is setting the stage for work to continue in tree. In particular, there are known naming and style violations in the current patch. I'll try to get those resolved over the next week or so. As I touch each area to make style changes, I need to make sure we have adequate testing in place. As part of the cleanup, I will be cleaning up a collection of test cases we have out of tree and submitting them upstream. The tests included in this change are very basic and mostly to provide examples of usage. The pass has several main subproblems it needs to address: - First, it has identify any live pointers. In the current code, the use of address spaces to distinguish pointers to GC managed objects is hard coded, but this will become parametrizable in the near future. Note that the current change doesn't actually contain a useful liveness analysis. It was seperated into a followup change as the code wasn't ready to be shared. Instead, the current implementation just considers any dominating def of appropriate pointer type to be live. - Second, it has to identify base pointers for each live pointer. This is a fairly straight forward data flow algorithm. - Third, the information in the previous steps is used to actually introduce rewrites. Rather than trying to do this by hand, we simply re-purpose the code behind Mem2Reg to do this for us. llvm-svn: 229945
* [objc-arc-contract] We can not move retains over instructions which can not ↵Michael Gottesman2015-02-201-24/+72
| | | | | | | | | | conservatively be proven to not decrement the retain's RCIdentity. I also cleaned up the code to make it more understandable for mere mortals. <rdar://problem/19853758> llvm-svn: 229937
* [ARM] Re-re-apply VLD1/VST1 base-update combine.Ahmed Bougacha2015-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932
* Avoid conversion to float when creating ConstantDataArray/ConstantDataVector.Rafael Espindola2015-02-192-0/+21
| | | | | | Patch by Raoux, Thomas F! llvm-svn: 229864
* Add few simple tests to check statepoint placement for invoke instructions.Igor Laevsky2015-02-191-0/+110
| | | | | | Differential Revision: http://reviews.llvm.org/D7535 llvm-svn: 229842
* [x86,sdag] Two interrelated changes to the x86 and sdag code.Chandler Carruth2015-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as *legal* so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the *hilarious* deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835
* Partial fix for bug 22589Sanjoy Das2015-02-183-15/+20
| | | | | | | | | | | Don't spend the entire iteration space in the scalar loop prologue if computing the trip count overflows. This change also gets rid of the backedge check in the prologue loop and the extra check for overflowing trip-count. Differential Revision: http://reviews.llvm.org/D7715 llvm-svn: 229731
* Minor fix after 229495.Elena Demikhovsky2015-02-181-12/+4
| | | | | | Removed metadata and function attributes from the test. llvm-svn: 229647
* [LoopAccesses] Modify test to also check symbolic strides with memchecksAdam Nemet2015-02-181-3/+9
| | | | | | | | | See the comment in the code. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229627
* [InstCombine] Do not insert a GEP instruction before a landingpad instruction.Akira Hatanaka2015-02-181-0/+44
| | | | | | | | | | | InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the insertion point, which caused the verifier to complain when a GEP was inserted before a landingpad instruction. This commit fixes it to use getFirstInsertionPt instead. rdar://problem/19394964 llvm-svn: 229619
* [BDCE] Don't forget uses of root instructions seen before the instruction itselfHal Finkel2015-02-181-0/+37
| | | | | | | | | | | | | | | | | When visiting the initial list of "root" instructions (those which must always be alive), for those that are integer-valued (such as invokes returning an integer), we mark their bits as (initially) all dead (we might, obviously, find uses of those bits later, but all bits are assumed dead until proven otherwise). Don't do so, however, if we're already seen a use of those bits by another root instruction (such as a store). Fixes a miscompile of the sanitizer unit tests on x86_64. Also, add a debug line for visiting the root instructions, and remove a debug line which tried to print instructions being removed (printing dead instructions is dangerous, and can sometimes crash). llvm-svn: 229618
* Fixed a bug in store sinking.Elena Demikhovsky2015-02-171-0/+114
| | | | | | | | | | The problem was in store-sink barrier check. Store sink barrier should be checked for ModRef (read-write) mode. http://llvm.org/bugs/show_bug.cgi?id=22613 llvm-svn: 229495
* [BDCE] Add a bit-tracking DCE passHal Finkel2015-02-172-0/+381
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the "aggressive DCE" pass), with the added capability to track dead bits of integer valued instructions and remove those instructions when all of the bits are dead. Currently, it does not actually do this all-bits-dead removal, but rather replaces the instruction's uses with a constant zero, and lets instcombine (and the later run of ADCE) do the rest. Because we essentially get a run of ADCE "for free" while tracking the dead bits, we also do what ADCE does and removes actually-dead instructions as well (this includes instructions newly trivially dead because all bits were dead, but not all such instructions can be removed). The motivation for this is a case like: int __attribute__((const)) foo(int i); int bar(int x) { x |= (4 & foo(5)); x |= (8 & foo(3)); x |= (16 & foo(2)); x |= (32 & foo(1)); x |= (64 & foo(0)); x |= (128& foo(4)); return x >> 4; } As it turns out, if you order the bit-field insertions so that all of the dead ones come last, then instcombine will remove them. However, if you pick some other order (such as the one above), the fact that some of the calls to foo() are useless is not locally obvious, and we don't remove them (without this pass). I did a quick compile-time overhead check using sqlite from the test suite (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about twice as expensive as ADCE). I've not looked at why yet, but we eliminate instructions due to having all-dead bits in: External/SPEC/CFP2006/447.dealII/447.dealII External/SPEC/CINT2006/400.perlbench/400.perlbench External/SPEC/CINT2006/403.gcc/403.gcc MultiSource/Applications/ClamAV/clamscan MultiSource/Benchmarks/7zip/7zip-benchmark llvm-svn: 229462
* InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))Mehdi Amini2015-02-161-0/+110
| | | | | | | Fixes radar 15486701. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229437
* Tests: reformat sitofp.ll and use FileCheckMehdi Amini2015-02-161-20/+39
| | | | | From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229436
* [LoopReroll] Relax some assumptions a little.James Molloy2015-02-161-0/+30
| | | | | | | | We won't find a root with index zero in any loop that we are able to reroll. However, we may find one in a non-rerollable loop, so bail gracefully instead of failing hard. llvm-svn: 229406
* [LoopReroll] Don't crash on dead codeJames Molloy2015-02-161-0/+36
| | | | | | | | If a PHI has no users, don't crash; bail gracefully. This shouldn't happen often, but we can make no guarantees that previous passes didn't leave dead code around. llvm-svn: 229405
* IR: Properly return nullptr when getAggregateElement is out-of-boundsDavid Majnemer2015-02-161-0/+19
| | | | | | | | | | | We didn't properly handle the out-of-bounds case for ConstantAggregateZero and UndefValue. This would manifest as a crash when the constant folder was asked to fold a load of a constant global whose struct type has no operands. This fixes PR22595. llvm-svn: 229352
* FileCheck-ize a test to make it easier to migrate to typeless pointersDavid Blaikie2015-02-151-2/+3
| | | | llvm-svn: 229278
* Update a test to make it easier to migrate to untyped pointersDavid Blaikie2015-02-151-1/+1
| | | | llvm-svn: 229277
* Update a test to use FileCheck so it's easier to migrate to future typeless ↵David Blaikie2015-02-151-1/+3
| | | | | | pointer changes llvm-svn: 229276
* Reformat test case to be easier to migrate to typeless pointers.David Blaikie2015-02-151-1/+4
| | | | llvm-svn: 229275
* InstCombine: propagate deref via new addDereferenceableAttrRamkumar Ramachandra2015-02-141-0/+20
| | | | | | | | | | | | | | | | | The "dereferenceable" attribute cannot be added via .addAttribute(), since it also expects a size in bytes. AttrBuilder#addAttribute or AttributeSet#addAttribute is wrapped by classes Function, InvokeInst, and CallInst. Add corresponding wrappers to AttrBuilder#addDereferenceableAttr. Having done this, propagate the dereferenceable attribute via gc.relocate, adding a test to exercise it. Note that -datalayout is required during execution over and above -instcombine, because InstCombine only optionally requires DataLayoutPass. Differential Revision: http://reviews.llvm.org/D7510 llvm-svn: 229265
* [InstCombine] When canonicalizing gep indices, prefer zext when possiblePhilip Reames2015-02-141-0/+61
| | | | | | | | | | If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead. This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?). We already apply a similar transform in DAGCombine, this just extends that to the IR level. This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. Differential Revision: http://reviews.llvm.org/D7255 llvm-svn: 229189
* [InstCombine] Fix regression introduced at r227197.Andrea Di Biagio2015-02-131-0/+27
| | | | | | | | | | | | | | | | | | This patch fixes a problem I accidentally introduced in an instruction combine on select instructions added at r227197. That revision taught the instruction combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. However, the new rule added at r227197 would have produced wrong results in the case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a zero-extend or truncate. In that case, the folded instruction would have been inserted in a wrong location thus leaving the CFG in an inconsistent state. This patch fixes the problem and add two reproducible test cases to existing test 'InstCombine/select-cmp-cttz-ctlz.ll'. llvm-svn: 229124
* [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to ↵Andrea Di Biagio2015-02-134-0/+298
| | | | | | | | | | | | | | | | speculate calls to cttz/ctlz. SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are 'cheap' for the target. Therefore, some of the logic in CodeGenPrepare that was originally added at revision 224899 can now be removed. This patch is basically a no functional change. It removes the duplicated logic in CodeGenPrepare and converts all the existing target specific tests for cttz/ctlz into SimplifyCFG tests. Differential Revision: http://reviews.llvm.org/D7608 llvm-svn: 229105
* [SimplifyCFG] Add test for r229099James Molloy2015-02-131-0/+22
| | | | | | Add extra test that was accidentally not staged. llvm-svn: 229101
* [unroll] Concede defeat and disable the unroll analyzer for now.Chandler Carruth2015-02-131-4/+4
| | | | | | | | | | The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064
* [InstCombine] Fix a bug when combining `icmp` from `ptrtoint`Michael Liao2015-02-131-1/+22
| | | | | | | | | | | | - First, there's a crash when we try to combine that pointers into `icmp` directly by creating a `bitcast`, which is invalid if that two pointers are from different address spaces. - It's not always appropriate to cast one pointer to another if they are from different address spaces as that is not no-op cast. Instead, we only combine `icmp` from `ptrtoint` if that two pointers are of the same address space. llvm-svn: 229063
* [IC] Fix a bug with the instcombine canonicalizing of loads andChandler Carruth2015-02-131-0/+19
| | | | | | | | | | | | | | | | | | | | propagating of metadata. We were propagating !nonnull metadata even when the newly formed load is no longer of a pointer type. This is clearly broken and results in LLVM failing the verifier and aborting. This patch just restricts the propagation of !nonnull metadata to when we actually have a pointer type. This bug report and the initial version of this patch was provided by Charles Davis! Many thanks for finding this! We still need to add logic to round-trip the metadata correctly if we combine from pointer types to integer types and then back by using range metadata for the integer type loads. But this is the minimal and safe version of the patch, which is important so we can backport it into 3.6. llvm-svn: 229029
* Check interleaving without relying on debug output.Olivier Sallenave2015-02-131-3/+14
| | | | llvm-svn: 229027
* Testcase for r228988.Michael Zolotukhin2015-02-131-0/+3
| | | | llvm-svn: 228995
* llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll REQUIRES ↵NAKAMURA Takumi2015-02-131-0/+1
| | | | | | +Asserts due to -debug. llvm-svn: 228989
* Change max interleave factor to 12 for POWER7 and POWER8.Olivier Sallenave2015-02-121-0/+35
| | | | llvm-svn: 228973
* Fix a crash in the assumption cache when inlining indirect function callsBjorn Steinbrink2015-02-121-0/+19
| | | | | | | | | | | | | | | | | Summary: Instances of the AssumptionCache are per function, so we can't re-use the same AssumptionCache instance when recursing in the CallAnalyzer to analyze a different function. Instead we have to pass the AssumptionCacheTracker to the CallAnalyzer so it can get the right AssumptionCache on demand. Reviewers: hfinkel Subscribers: llvm-commits, hans Differential Revision: http://reviews.llvm.org/D7533 llvm-svn: 228957
* Update test case.Benjamin Kramer2015-02-121-2/+2
| | | | llvm-svn: 228956
* InstCombine: Allow folding of xor into icmp by changing the predicate for ↵Benjamin Kramer2015-02-121-0/+6
| | | | | | | | vectors The loop vectorizer can create this pattern. llvm-svn: 228954
* Add a testcase for r228432.Michael Zolotukhin2015-02-121-0/+34
| | | | llvm-svn: 228951
* [LoopRerolling] Be more forgiving with instruction order.James Molloy2015-02-121-0/+57
| | | | | | | | | We can't solve the full subgraph isomorphism problem. But we can allow obvious cases, where for example two instructions of different types are out of order. Due to them having different types/opcodes, there is no ambiguity. llvm-svn: 228931
OpenPOWER on IntegriCloud