summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Re-re-apply VLD1/VST1 base-update combine.Ahmed Bougacha2015-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932
* Avoid conversion to float when creating ConstantDataArray/ConstantDataVector.Rafael Espindola2015-02-192-0/+21
| | | | | | Patch by Raoux, Thomas F! llvm-svn: 229864
* Add few simple tests to check statepoint placement for invoke instructions.Igor Laevsky2015-02-191-0/+110
| | | | | | Differential Revision: http://reviews.llvm.org/D7535 llvm-svn: 229842
* [x86,sdag] Two interrelated changes to the x86 and sdag code.Chandler Carruth2015-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as *legal* so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the *hilarious* deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835
* Partial fix for bug 22589Sanjoy Das2015-02-183-15/+20
| | | | | | | | | | | Don't spend the entire iteration space in the scalar loop prologue if computing the trip count overflows. This change also gets rid of the backedge check in the prologue loop and the extra check for overflowing trip-count. Differential Revision: http://reviews.llvm.org/D7715 llvm-svn: 229731
* Minor fix after 229495.Elena Demikhovsky2015-02-181-12/+4
| | | | | | Removed metadata and function attributes from the test. llvm-svn: 229647
* [LoopAccesses] Modify test to also check symbolic strides with memchecksAdam Nemet2015-02-181-3/+9
| | | | | | | | | See the comment in the code. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229627
* [InstCombine] Do not insert a GEP instruction before a landingpad instruction.Akira Hatanaka2015-02-181-0/+44
| | | | | | | | | | | InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the insertion point, which caused the verifier to complain when a GEP was inserted before a landingpad instruction. This commit fixes it to use getFirstInsertionPt instead. rdar://problem/19394964 llvm-svn: 229619
* [BDCE] Don't forget uses of root instructions seen before the instruction itselfHal Finkel2015-02-181-0/+37
| | | | | | | | | | | | | | | | | When visiting the initial list of "root" instructions (those which must always be alive), for those that are integer-valued (such as invokes returning an integer), we mark their bits as (initially) all dead (we might, obviously, find uses of those bits later, but all bits are assumed dead until proven otherwise). Don't do so, however, if we're already seen a use of those bits by another root instruction (such as a store). Fixes a miscompile of the sanitizer unit tests on x86_64. Also, add a debug line for visiting the root instructions, and remove a debug line which tried to print instructions being removed (printing dead instructions is dangerous, and can sometimes crash). llvm-svn: 229618
* Fixed a bug in store sinking.Elena Demikhovsky2015-02-171-0/+114
| | | | | | | | | | The problem was in store-sink barrier check. Store sink barrier should be checked for ModRef (read-write) mode. http://llvm.org/bugs/show_bug.cgi?id=22613 llvm-svn: 229495
* [BDCE] Add a bit-tracking DCE passHal Finkel2015-02-172-0/+381
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the "aggressive DCE" pass), with the added capability to track dead bits of integer valued instructions and remove those instructions when all of the bits are dead. Currently, it does not actually do this all-bits-dead removal, but rather replaces the instruction's uses with a constant zero, and lets instcombine (and the later run of ADCE) do the rest. Because we essentially get a run of ADCE "for free" while tracking the dead bits, we also do what ADCE does and removes actually-dead instructions as well (this includes instructions newly trivially dead because all bits were dead, but not all such instructions can be removed). The motivation for this is a case like: int __attribute__((const)) foo(int i); int bar(int x) { x |= (4 & foo(5)); x |= (8 & foo(3)); x |= (16 & foo(2)); x |= (32 & foo(1)); x |= (64 & foo(0)); x |= (128& foo(4)); return x >> 4; } As it turns out, if you order the bit-field insertions so that all of the dead ones come last, then instcombine will remove them. However, if you pick some other order (such as the one above), the fact that some of the calls to foo() are useless is not locally obvious, and we don't remove them (without this pass). I did a quick compile-time overhead check using sqlite from the test suite (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about twice as expensive as ADCE). I've not looked at why yet, but we eliminate instructions due to having all-dead bits in: External/SPEC/CFP2006/447.dealII/447.dealII External/SPEC/CINT2006/400.perlbench/400.perlbench External/SPEC/CINT2006/403.gcc/403.gcc MultiSource/Applications/ClamAV/clamscan MultiSource/Benchmarks/7zip/7zip-benchmark llvm-svn: 229462
* InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))Mehdi Amini2015-02-161-0/+110
| | | | | | | Fixes radar 15486701. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229437
* Tests: reformat sitofp.ll and use FileCheckMehdi Amini2015-02-161-20/+39
| | | | | From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229436
* [LoopReroll] Relax some assumptions a little.James Molloy2015-02-161-0/+30
| | | | | | | | We won't find a root with index zero in any loop that we are able to reroll. However, we may find one in a non-rerollable loop, so bail gracefully instead of failing hard. llvm-svn: 229406
* [LoopReroll] Don't crash on dead codeJames Molloy2015-02-161-0/+36
| | | | | | | | If a PHI has no users, don't crash; bail gracefully. This shouldn't happen often, but we can make no guarantees that previous passes didn't leave dead code around. llvm-svn: 229405
* IR: Properly return nullptr when getAggregateElement is out-of-boundsDavid Majnemer2015-02-161-0/+19
| | | | | | | | | | | We didn't properly handle the out-of-bounds case for ConstantAggregateZero and UndefValue. This would manifest as a crash when the constant folder was asked to fold a load of a constant global whose struct type has no operands. This fixes PR22595. llvm-svn: 229352
* FileCheck-ize a test to make it easier to migrate to typeless pointersDavid Blaikie2015-02-151-2/+3
| | | | llvm-svn: 229278
* Update a test to make it easier to migrate to untyped pointersDavid Blaikie2015-02-151-1/+1
| | | | llvm-svn: 229277
* Update a test to use FileCheck so it's easier to migrate to future typeless ↵David Blaikie2015-02-151-1/+3
| | | | | | pointer changes llvm-svn: 229276
* Reformat test case to be easier to migrate to typeless pointers.David Blaikie2015-02-151-1/+4
| | | | llvm-svn: 229275
* InstCombine: propagate deref via new addDereferenceableAttrRamkumar Ramachandra2015-02-141-0/+20
| | | | | | | | | | | | | | | | | The "dereferenceable" attribute cannot be added via .addAttribute(), since it also expects a size in bytes. AttrBuilder#addAttribute or AttributeSet#addAttribute is wrapped by classes Function, InvokeInst, and CallInst. Add corresponding wrappers to AttrBuilder#addDereferenceableAttr. Having done this, propagate the dereferenceable attribute via gc.relocate, adding a test to exercise it. Note that -datalayout is required during execution over and above -instcombine, because InstCombine only optionally requires DataLayoutPass. Differential Revision: http://reviews.llvm.org/D7510 llvm-svn: 229265
* [InstCombine] When canonicalizing gep indices, prefer zext when possiblePhilip Reames2015-02-141-0/+61
| | | | | | | | | | If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead. This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?). We already apply a similar transform in DAGCombine, this just extends that to the IR level. This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. Differential Revision: http://reviews.llvm.org/D7255 llvm-svn: 229189
* [InstCombine] Fix regression introduced at r227197.Andrea Di Biagio2015-02-131-0/+27
| | | | | | | | | | | | | | | | | | This patch fixes a problem I accidentally introduced in an instruction combine on select instructions added at r227197. That revision taught the instruction combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. However, the new rule added at r227197 would have produced wrong results in the case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a zero-extend or truncate. In that case, the folded instruction would have been inserted in a wrong location thus leaving the CFG in an inconsistent state. This patch fixes the problem and add two reproducible test cases to existing test 'InstCombine/select-cmp-cttz-ctlz.ll'. llvm-svn: 229124
* [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to ↵Andrea Di Biagio2015-02-134-0/+298
| | | | | | | | | | | | | | | | speculate calls to cttz/ctlz. SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are 'cheap' for the target. Therefore, some of the logic in CodeGenPrepare that was originally added at revision 224899 can now be removed. This patch is basically a no functional change. It removes the duplicated logic in CodeGenPrepare and converts all the existing target specific tests for cttz/ctlz into SimplifyCFG tests. Differential Revision: http://reviews.llvm.org/D7608 llvm-svn: 229105
* [SimplifyCFG] Add test for r229099James Molloy2015-02-131-0/+22
| | | | | | Add extra test that was accidentally not staged. llvm-svn: 229101
* [unroll] Concede defeat and disable the unroll analyzer for now.Chandler Carruth2015-02-131-4/+4
| | | | | | | | | | The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064
* [InstCombine] Fix a bug when combining `icmp` from `ptrtoint`Michael Liao2015-02-131-1/+22
| | | | | | | | | | | | - First, there's a crash when we try to combine that pointers into `icmp` directly by creating a `bitcast`, which is invalid if that two pointers are from different address spaces. - It's not always appropriate to cast one pointer to another if they are from different address spaces as that is not no-op cast. Instead, we only combine `icmp` from `ptrtoint` if that two pointers are of the same address space. llvm-svn: 229063
* [IC] Fix a bug with the instcombine canonicalizing of loads andChandler Carruth2015-02-131-0/+19
| | | | | | | | | | | | | | | | | | | | propagating of metadata. We were propagating !nonnull metadata even when the newly formed load is no longer of a pointer type. This is clearly broken and results in LLVM failing the verifier and aborting. This patch just restricts the propagation of !nonnull metadata to when we actually have a pointer type. This bug report and the initial version of this patch was provided by Charles Davis! Many thanks for finding this! We still need to add logic to round-trip the metadata correctly if we combine from pointer types to integer types and then back by using range metadata for the integer type loads. But this is the minimal and safe version of the patch, which is important so we can backport it into 3.6. llvm-svn: 229029
* Check interleaving without relying on debug output.Olivier Sallenave2015-02-131-3/+14
| | | | llvm-svn: 229027
* Testcase for r228988.Michael Zolotukhin2015-02-131-0/+3
| | | | llvm-svn: 228995
* llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll REQUIRES ↵NAKAMURA Takumi2015-02-131-0/+1
| | | | | | +Asserts due to -debug. llvm-svn: 228989
* Change max interleave factor to 12 for POWER7 and POWER8.Olivier Sallenave2015-02-121-0/+35
| | | | llvm-svn: 228973
* Fix a crash in the assumption cache when inlining indirect function callsBjorn Steinbrink2015-02-121-0/+19
| | | | | | | | | | | | | | | | | Summary: Instances of the AssumptionCache are per function, so we can't re-use the same AssumptionCache instance when recursing in the CallAnalyzer to analyze a different function. Instead we have to pass the AssumptionCacheTracker to the CallAnalyzer so it can get the right AssumptionCache on demand. Reviewers: hfinkel Subscribers: llvm-commits, hans Differential Revision: http://reviews.llvm.org/D7533 llvm-svn: 228957
* Update test case.Benjamin Kramer2015-02-121-2/+2
| | | | llvm-svn: 228956
* InstCombine: Allow folding of xor into icmp by changing the predicate for ↵Benjamin Kramer2015-02-121-0/+6
| | | | | | | | vectors The loop vectorizer can create this pattern. llvm-svn: 228954
* Add a testcase for r228432.Michael Zolotukhin2015-02-121-0/+34
| | | | llvm-svn: 228951
* [LoopRerolling] Be more forgiving with instruction order.James Molloy2015-02-121-0/+57
| | | | | | | | | We can't solve the full subgraph isomorphism problem. But we can allow obvious cases, where for example two instructions of different types are out of order. Due to them having different types/opcodes, there is no ambiguity. llvm-svn: 228931
* [TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is ↵Andrea Di Biagio2015-02-121-0/+189
| | | | | | | | | | | | | | | | | | | | | | | | 'free' for the target. Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl how to query TLI in order to get a more accurate cost for truncates and zero-extends. Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase would have conservatively returned a 'default' TCC_Basic for all zero-extends, and TCC_Free for truncates on native types. This patch improves the heuristic so that we query TLI (if available) to get more accurate answers. If TLI is available, then methods 'isZExtFree' and 'isTruncateFree' can be used to check if a zext/trunc is free for the target. Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll. With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz immediately followed by a free zext/trunc. Differential Revision: http://reviews.llvm.org/D7585 llvm-svn: 228923
* [slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out.Chandler Carruth2015-02-121-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apparently some code finally started to tickle this after my canonicalization changes to instcombine. The bug stems from trying to form a vector type out of scalars that aren't compatible at all. In this example, from x86_mmx values. The code in the vectorizer that checks for reasonable types whas checking for aggregates or vectors, but there are lots of other types that should just never reach the vectorizer. Debugging this was made more confusing by the lie in an assert in VectorType::get() -- it isn't that the types are *primitive*. The types must be integer, pointer, or floating point types. No other types are allowed. I've improved the assert and added a helper to the vectorizer to handle the element type validity checks. It now re-uses the VectorType static function and then further excludes weird target-specific types that we probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of these are really reachable anyways (neither 80-bit nor 128-bit things will get vectorized) but it seems better to just eagerly exclude such nonesense. I've added a test case, but while it definitely covers two of the paths through this code there may be more paths that would benefit from test coverage. I'm not familiar enough with the SLP vectorizer to synthesize test cases for all of these, but was able to update the code itself by inspection. llvm-svn: 228899
* DeadArgElim: aggregate Return assessment properly.Tim Northover2015-02-111-0/+30
| | | | | | | | | I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It actually depends on the index too, which means we need to be careful about how the results are combined before return. In particular if a single Use returns Live, that counts for the entire object, at the granularity we're considering. llvm-svn: 228885
* Reassociate: cannot negate a INT_MIN valueMehdi Amini2015-02-111-0/+13
| | | | | | | | | | | | | | | | | | Summary: When trying to canonicalize negative constants out of multiplication expressions, we need to check that the constant is not INT_MIN which cannot be negated. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7286 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 228872
* [TTI] Improved cost heuristic for cttz/ctlz calls.Andrea Di Biagio2015-02-112-16/+141
| | | | | | | | | | | | | | | | | This patch is a follow-up of r228826 (see code-review: D7506). Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we have to fix the cost heuristic for intrinsic calls to cttz/ctlz. This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl queries TLI to check if a call to cttz/ctlz is cheap for the target. Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86, SimplifyCFG only speculates a call to cttz/ctlz if it is cheap. Differential Revision: http://reviews.llvm.org/D7554 llvm-svn: 228829
* [SimplifyCFG] Swap to using TargetTransformInfo for costJames Molloy2015-02-113-32/+11
| | | | | | | | | | | | | | | | | | analysis. We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness" heuristic and use TTI directly. Generally NFC intended, but we're using a slightly different heuristic now so there is a slight test churn. Test changes: * combine-comparisons-by-cse.ll: Removed unneeded branch check. * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq. * coalesce-subregs.ll: Superfluous block check. * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv. * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present. * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI. llvm-svn: 228826
* [LoopReroll] Introduce the concept of DAGRootSets.James Molloy2015-02-111-0/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A DAGRootSet models an induction variable being used in a rerollable loop. For example: x[i*3+0] = y1 x[i*3+1] = y2 x[i*3+2] = y3 Base instruction -> i*3 +---+----+ / | \ ST[y1] +1 +2 <-- Roots | | ST[y2] ST[y3] There may be multiple DAGRootSets, for example: x[i*2+0] = ... (1) x[i*2+1] = ... (1) x[i*2+4] = ... (2) x[i*2+5] = ... (2) x[(i+1234)*2+5678] = ... (3) x[(i+1234)*2+5679] = ... (3) This concept is similar to the "Scale" member used previously, but allows multiple independent sets of roots based off the same induction variable. llvm-svn: 228821
* Fix invalid LLVM IR in PruneEH testsReid Kleckner2015-02-112-0/+9
| | | | llvm-svn: 228786
* Don't promote asynch EH invokes of nounwind functions to callsReid Kleckner2015-02-114-15/+66
| | | | | | | | | | | If the landingpad of the invoke is using a personality function that catches asynch exceptions, then it can catch a trap. Also add some landingpads to invalid LLVM IR test cases that lack them. Over-the-shoulder reviewed by David Majnemer. llvm-svn: 228782
* EarlyCSE: Add check lines for test added in r228760David Majnemer2015-02-101-0/+3
| | | | llvm-svn: 228761
* EarlyCSE: It isn't safe to CSE across synchronization boundariesDavid Majnemer2015-02-101-1/+7
| | | | | | This fixes PR22514. llvm-svn: 228760
* DeadArgElim: arguments affect all returned sub-values by default.Tim Northover2015-02-101-0/+17
| | | | | | | | | | Unless we meet an insertvalue on a path from some value to a return, that value will be live if *any* of the return's components are live, so all of those components must be added to the MaybeLiveUses. Previously we were deleting arguments if sub-value 0 turned out to be dead. llvm-svn: 228731
* Add a test case for new unrolling heuristics.Michael Zolotukhin2015-02-101-0/+59
| | | | | | THe heuristics were added in r228265 and r228434. llvm-svn: 228713
OpenPOWER on IntegriCloud