summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Transform div to mul with reciprocal only when fp imm is legal.Anton Korobeynikov2012-04-101-2/+9
| | | | | | This fixes PR12516 and uncovers one weird problem in legalize (workarounded) llvm-svn: 154394
* Don't try to zExt just to check if an integer constant is zero, it mightRafael Espindola2012-04-101-2/+2
| | | | | | not fit in a i64. llvm-svn: 154364
* Pattern match a setcc of boolean value with 0 as a truncate.Rafael Espindola2012-04-091-9/+48
| | | | llvm-svn: 154322
* Remove unnecessary type check when combining and/or/xor of swizzles. Move ↵Craig Topper2012-04-091-13/+12
| | | | | | some checks to allow better early out. llvm-svn: 154309
* Remove unnecessary 'else' on an 'if' that always returnsCraig Topper2012-04-091-1/+2
| | | | llvm-svn: 154308
* Optimize code slightly. No functionality change.Craig Topper2012-04-091-6/+7
| | | | llvm-svn: 154307
* Replace some explicit checks with asserts for conditions that should never ↵Craig Topper2012-04-091-14/+7
| | | | | | happen. llvm-svn: 154305
* Silence sign-compare warning.Benjamin Kramer2012-04-081-1/+1
| | | | llvm-svn: 154297
* Only have codegen turn fdiv by a constant into fmul by the reciprocalDuncan Sands2012-04-081-5/+3
| | | | | | | | when -ffast-math, i.e. don't just always do it if the reciprocal can be formed exactly. There is already an IR level transform that does that, and it does it more carefully. llvm-svn: 154296
* 1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a newNadav Rotem2012-04-071-12/+14
| | | | | | | | | | shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266
* Convert floating point division by a constant into multiplication by theDuncan Sands2012-04-071-0/+13
| | | | | | | | reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265
* Always compute all the bits in ComputeMaskedBits.Rafael Espindola2012-04-041-15/+10
| | | | | | | | This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011
* Add predicates for checking whether targets have free FNEG and FABS ↵Owen Anderson2012-04-021-3/+5
| | | | | | operations, and prevent the DAGCombiner from turning them into bitwise operations if they do. llvm-svn: 153901
* Optimizing swizzles of complex shuffles may generate additional complex ↵Nadav Rotem2012-04-021-1/+9
| | | | | | | | | shuffles. Do not try to optimize swizzles of shuffles if the source shuffle has more than a single user, except when the source shuffle is also a swizzle. llvm-svn: 153864
* This commit contains a few changes that had to go in together.Nadav Rotem2012-04-011-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) (and also scalar_to_vector). 2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src). Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B)) 3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y). 4. Fix an X86ISelLowering optimization which was very bitcast-sensitive. Code which was previously compiled to this: movd (%rsi), %xmm0 movdqa .LCPI0_0(%rip), %xmm2 pshufb %xmm2, %xmm0 movd (%rdi), %xmm1 pshufb %xmm2, %xmm1 pxor %xmm0, %xmm1 pshufb .LCPI0_1(%rip), %xmm1 movd %xmm1, (%rdi) ret Now compiles to this: movl (%rsi), %eax xorl %eax, (%rdi) ret llvm-svn: 153848
* fix what looks like a real logic bug, found by PVS-Studio (part of PR12357)Chris Lattner2012-03-271-2/+2
| | | | llvm-svn: 153513
* When combining (vextract shuffle (load ), <1,u,u,u>), 0) -> (load ), add ↵Craig Topper2012-03-201-0/+1
| | | | | | users of the final load to the worklist too. Needed by changes I'm preparing to make to X86 backend. llvm-svn: 153078
* Fix DAG combine which creates illegal vector shuffles. Patch by Heikki Kultala.Duncan Sands2012-03-191-0/+6
| | | | llvm-svn: 153035
* When optimizing certain BUILD_VECTOR nodes into other BUILD_VECTOR nodes, ↵Nadav Rotem2012-03-151-0/+4
| | | | | | add the new node into the work list because there is a potential for further optimizations. llvm-svn: 152784
* Add a xform to the DAG combiner.Bill Wendling2012-03-151-0/+17
| | | | | | | | | | | | Transform: (fsub x, (fadd x, y)) -> (fneg y) and (fsub x, (fadd y, x)) -> (fneg y) if 'unsafe math' is specified. <rdar://problem/7540295> llvm-svn: 152777
* Fortify r152675 a bit. Although I'm not able to come up with a test case ↵Evan Cheng2012-03-131-3/+11
| | | | | | that would trigger the truncation case. llvm-svn: 152678
* DAG combine incorrectly optimize (i32 vextract (v4i16 load $addr), c) toEvan Cheng2012-03-131-4/+19
| | | | | | | | | (i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)). rdar://11035895 llvm-svn: 152675
* Give dagcombiner's worklist some inline capacity.Benjamin Kramer2012-03-101-3/+2
| | | | llvm-svn: 152454
* Extend r148086 to check for [r +/- reg] address mode. This fixes queens ↵Evan Cheng2012-03-061-4/+7
| | | | | | performance regression (due to increased register pressure from overly aggressive pre-inc formation). llvm-svn: 152162
* Make it possible for a target to mark FSUB as Expand. This requires ↵Owen Anderson2012-03-061-16/+29
| | | | | | providing a default expansion (FADD+FNEG), and teaching DAGCombine not to form FSUBs post-legalize if they are not legal. llvm-svn: 152079
* Teach the DAGCombiner that certain loadext nodes followed by ANDs can be ↵James Molloy2012-02-201-0/+82
| | | | | | converted to zeroexts. llvm-svn: 150957
* Remove extraneous #include and spelling mistake introduced in r150669.James Molloy2012-02-161-2/+1
| | | | llvm-svn: 150670
* Modify the algorithm when traversing the DAGCombiner's worklist to be O(log ↵James Molloy2012-02-161-13/+36
| | | | | | N) for all operations. This fixes a horrible worst case with lots of nodes where 99% of the time was being spent in std::remove. llvm-svn: 150669
* Fix a bug in DAGCombine for the optimization of BUILD_VECTOR. We cant ↵Nadav Rotem2012-02-131-2/+6
| | | | | | generate a shuffle node from two vectors of different types. llvm-svn: 150383
* This patch addresses the problem of poor code generation for the zextNadav Rotem2012-02-121-14/+29
| | | | | | | | | | | | | | | | | | | v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes. The DAGCombiner has two optimizations that can mitigate the problem. First, if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT nodes, then it is possible to create a new simplified BUILD_VECTOR which uses UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes. Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle vector instruction. In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be shuffled into a wide YMM register. This patch modifes the second optimization and allows the creation of shuffle vectors even when the newly generated vector and the original vector from which we extract the values are of different types. llvm-svn: 150340
* Add additional documentation to the extract-and-trunc dagcombine optimization.Nadav Rotem2012-02-051-3/+8
| | | | llvm-svn: 149823
* The type-legalizer often scalarizes code. One of the common patterns is ↵Nadav Rotem2012-02-031-0/+34
| | | | | | | | | extract-and-truncate. In this patch we optimize this pattern and convert the sequence into extract op of a narrow type. This allows the BUILD_VECTOR dag optimizations to construct efficient shuffle operations in many cases. llvm-svn: 149692
* Transform: (EXTRACT_VECTOR_ELT( VECTOR_SHUFFLE )) -> EXTRACT_VECTOR_ELT.Nadav Rotem2012-01-171-4/+35
| | | | llvm-svn: 148337
* Teach DAG combiner to turn a BUILD_VECTOR of UNDEFs into an UNDEF of vector ↵Craig Topper2012-01-171-4/+8
| | | | | | type. llvm-svn: 148297
* DAGCombiner: Deduplicate code.Benjamin Kramer2012-01-151-24/+14
| | | | llvm-svn: 148217
* DAGCombine's logic for forming pre- and post- indexed loads / stores were beingEvan Cheng2012-01-131-9/+44
| | | | | | | | | | | | | | | | overly conservative. It was concerned about cases where it would prohibit folding simple [r, c] addressing modes. e.g. ldr r0, [r2] ldr r1, [r2, #4] => ldr r0, [r2], #4 ldr r1, [r2] Change the logic to look for such cases which allows it to form indexed memory ops more aggressively. rdar://10674430 llvm-svn: 148086
* Teach the X86 instruction selection to do some heroic transforms toChandler Carruth2012-01-111-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | detect a pattern which can be implemented with a small 'shl' embedded in the addressing mode scale. This happens in real code as follows: unsigned x = my_accelerator_table[input >> 11]; Here we have some lookup table that we look into using the high bits of 'input'. Each entity in the table is 4-bytes, which means this implicitly gets turned into (once lowered out of a GEP): *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2)); The shift right followed by a shift left is canonicalized to a smaller shift right and masking off the low bits. That hides the shift right which x86 has an addressing mode designed to support. We now detect masks of this form, and produce the longer shift right followed by the proper addressing mode. In addition to saving a (rather large) instruction, this also reduces stalls in Intel chips on benchmarks I've measured. In order for all of this to work, one part of the DAG needs to be canonicalized *still further* than it currently is. This involves removing pointless 'trunc' nodes between a zextload and a zext. Without that, we end up generating spurious masks and hiding the pattern. llvm-svn: 147936
* Replace some uses of hasNUsesOfValue(0, X) with !hasAnyUseOfValue(X)Craig Topper2012-01-071-4/+4
| | | | llvm-svn: 147733
* Add some DAG combines for SUBC/SUBE. If nothing uses the carry/borrow out of ↵Craig Topper2012-01-071-2/+51
| | | | | | subc, turn it into a sub. Turn (subc x, x) into 0 with no borrow. Turn (subc x, 0) into x with no borrow. Turn (subc -1, x) into (xor x, -1) with no borrow. Turn sube with no borrow in into subc. llvm-svn: 147728
* Prevent a DAGCombine from firing where there are two uses ofChandler Carruth2012-01-051-1/+3
| | | | | | | | | a combined-away node and the result of the combine isn't substantially smaller than the input, it's just canonicalized. This is the first part of a significant (7%) performance gain for Snappy's hot decompression loop. llvm-svn: 147604
* Implement VECTOR_SHUFFLE canonicalizations during DAG combine.Craig Topper2012-01-041-2/+50
| | | | llvm-svn: 147525
* Make sure DAGCombiner doesn't introduce multiple loads from the same memory ↵Eli Friedman2011-12-261-1/+23
| | | | | | location. PR10747, part 2. llvm-svn: 147283
* Initial CodeGen support for CTTZ/CTLZ where a zero input produces anChandler Carruth2011-12-131-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | undefined result. This adds new ISD nodes for the new semantics, selecting them when the LLVM intrinsic indicates that the undef behavior is desired. The new nodes expand trivially to the old nodes, so targets don't actually need to do anything to support these new nodes besides indicating that they should be expanded. I've done this for all the operand types that I could figure out for all the targets. Owners of various targets, please review and let me know if any of these are incorrect. Note that the expand behavior is *conservatively correct*, and exactly matches LLVM's current behavior with these operations. Ideally this patch will not change behavior in any way. For example the regtest suite finds the exact same instruction sequences coming out of the code generator. That's why there are no new tests here -- all of this is being exercised by the existing test suite. Thanks to Duncan Sands for reviewing the various bits of this patch and helping me get the wrinkles ironed out with expanding for each target. Also thanks to Chris for clarifying through all the discussions that this is indeed the approach he was looking for. That said, there are likely still rough spots. Further review much appreciated. llvm-svn: 146466
* Zap unnecessary isIntDivCheap() check. PR11485. No testcase because this ↵Eli Friedman2011-12-071-1/+1
| | | | | | doesn't affect any in-tree target. llvm-svn: 146015
* Fix an optimization involving EXTRACT_SUBVECTOR in DAGCombine so it behaves ↵Eli Friedman2011-12-071-13/+17
| | | | | | correctly. PR11494. llvm-svn: 145996
* Move global variables in TargetMachine into new TargetOptions class. As an APINick Lewycky2011-12-021-30/+49
| | | | | | | | | | | | change, now you need a TargetOptions object to create a TargetMachine. Clang patch to follow. One small functionality change in PTX. PTX had commented out the machine verifier parts in their copy of printAndVerify. That now calls the version in LLVMTargetMachine. Users of PTX who need verification disabled should rely on not passing the command-line flag to enable it. llvm-svn: 145714
* Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead.Evan Cheng2011-11-281-26/+12
| | | | | | | | | | | Conservatively returns zero when the GV does not specify an alignment nor is it initialized. Previously it returns ABI alignment for type of the GV. However, if the type is a "packed" type, then the under-specified alignments is attached to the load / store instructions. In that case, the alignment of the type cannot be trusted. rdar://10464621 llvm-svn: 145300
* DAG combine should not increase alignment of loads / stores with alignment lessEvan Cheng2011-11-281-12/+26
| | | | | | | | | than ABI alignment. These are loads / stores from / to "packed" data structures. Their alignments are intentionally under-specified. rdar://10301431 llvm-svn: 145273
* Make sure to replace the chain properly when DAGCombining a ↵Eli Friedman2011-11-161-4/+17
| | | | | | LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393. llvm-svn: 144863
* Remove some unnecessary includes of PseudoSourceValue.h.Jay Foad2011-11-151-1/+0
| | | | llvm-svn: 144634
OpenPOWER on IntegriCloud