summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* Lower stackmap intrinsics directly to their target opcode in the DAG builder.Andrew Trick2013-10-313-11/+216
| | | | llvm-svn: 193769
* whitespaceAndrew Trick2013-10-311-7/+7
| | | | llvm-svn: 193765
* Legalize: Improve legalization of long vector extends.Jim Grosbach2013-10-312-3/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an extend more than doubles the size of the elements (e.g., a zext from v16i8 to v16i32), the normal legalization method of splitting the vectors will run into problems as by the time the destination vector is legal, the source vector is illegal. The end result is the operation often becoming scalarized, with the typical horrible performance. For example, on x86_64, the simple input of: define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind { %tmp = zext <16 x i8> %a to <16 x i32> store <16 x i32> %tmp, <16 x i32>*%p ret void } Generates: .section __TEXT,__text,regular,pure_instructions .section __TEXT,__const .align 5 LCPI0_0: .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .section __TEXT,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: vpunpckhbw %xmm0, %xmm0, %xmm1 vpunpckhwd %xmm0, %xmm1, %xmm2 vpmovzxwd %xmm1, %xmm1 vinsertf128 $1, %xmm2, %ymm1, %ymm1 vmovaps LCPI0_0(%rip), %ymm2 vandps %ymm2, %ymm1, %ymm1 vpmovzxbw %xmm0, %xmm3 vpunpckhwd %xmm0, %xmm3, %xmm3 vpmovzxbd %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vandps %ymm2, %ymm0, %ymm0 vmovaps %ymm0, (%rdi) vmovaps %ymm1, 32(%rdi) vzeroupper ret So instead we can check if there are legal types that enable us to split more cleverly when the input vector is already legal such that we don't turn it into an illegal type. If the extend is such that it's more than doubling the size of the input we check if - the number of vector elements is even, - the source type is legal, - the type of a split source is illegal, - the type of an extended (by doubling element size) source is legal, and - the type of that extended source when split is legal. If the conditions are met, instead of just splitting both the destination and the source types, we create an extend that only goes up one "step" (doubling the element width), and the continue legalizing the rest of the operation normally. The result is that this operates as a new, more effecient, termination condition for the loop of "split the operation until the destination type is legal." With this change, the above example now compiles to: _bar: vpxor %xmm1, %xmm1, %xmm1 vpunpcklbw %xmm1, %xmm0, %xmm2 vpunpckhwd %xmm1, %xmm2, %xmm3 vpunpcklwd %xmm1, %xmm2, %xmm2 vinsertf128 $1, %xmm3, %ymm2, %ymm2 vpunpckhbw %xmm1, %xmm0, %xmm0 vpunpckhwd %xmm1, %xmm0, %xmm3 vpunpcklwd %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vmovaps %ymm0, 32(%rdi) vmovaps %ymm2, (%rdi) vzeroupper ret This generalizes a custom lowering that was added a while back to the ARM backend. That lowering is no longer necessary, and is removed. The testcases for it, however, provide excellent ARM tests for this change and so remain. rdar://14735100 llvm-svn: 193727
* Fix CodeGen for unaligned loads with address spacesMatt Arsenault2013-10-301-2/+5
| | | | llvm-svn: 193721
* Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs ↵Juergen Ributzka2013-10-302-35/+8
| | | | | | | | splitting too." Now Hexagon and SystemZ are not happy with it :-( llvm-svn: 193677
* SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.Juergen Ributzka2013-10-302-8/+35
| | | | | | | | | | | | | | | | | | | | The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask type for the given target. This mask has usually the same size as the VSELECT return type (except for Intel KNL). Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. Reviewed by Nadav llvm-svn: 193676
* Fix "existant" typosAlp Toker2013-10-291-1/+1
| | | | llvm-svn: 193579
* [DAGCombiner] Respect volatility when checking for aliasesRichard Sandiford2013-10-281-18/+25
| | | | | | | | Making useAA() default to true for SystemZ showed that the combiner alias analysis wasn't handling volatile accesses. This hit many of the SystemZ tests, but I arbitrarily picked one for the purpose of this patch. llvm-svn: 193518
* Keep TBAA info when rewriting SelectionDAG loads and storesRichard Sandiford2013-10-288-191/+181
| | | | | | | | | | | | | | | | | Most SelectionDAG code drops the TBAA info when creating a new form of a load and store (e.g. during legalization, or when converting a plain load to an extending one). This patch tries to catch all cases where the TBAA information can legitimately be carried over. The patch adds alternative forms of getLoad() and getExtLoad() that take a MachineMemOperand instead of individual fields. (The corresponding getTruncStore() already exists.) The idea is to use the MachineMemOperand forms when all fields are carried over (size, pointer info, isVolatile, isNonTemporal, alignment and TBAA info). If some adjustment is being made, e.g. to narrow the load, then we still pass the individual fields but also pass the TBAA info. llvm-svn: 193517
* LegalizeDAG: allow libcalls for max/min atomic operationsTim Northover2013-10-251-0/+40
| | | | | | | | | | | ARM processors without ldrex/strex need to be able to make libcalls for all atomic operations, including the newer min/max versions. The alternative would probably be expanding these operations in terms of cmpxchg (as x86 does always), but in the configurations where this matters code-size tends to be paramount so the libcall is more desirable. llvm-svn: 193398
* Optimize concat_vectors(X, undef) -> scalar_to_vector(X).Nadav Rotem2013-10-251-1/+28
| | | | | | | This optimization is not SSE specific so I am moving it to DAGco. The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add. llvm-svn: 193393
* SelectionDAG: Pass along the original argument/element type in ISD::InputArgTom Stellard2013-10-231-5/+8
| | | | | | | | | | | | | | | | For some targets, it is useful to be able to look at the original type of an argument without having to dig through the original IR. This also fixes a bug in SelectionDAGBuilder where InputArg.PartOffset was not taking into account the offset of structure elements. Patch by: Justin Holewinski Tom Stellard: - Changed the type of ArgVT to EVT, so it can store non-simple types like v3i32. llvm-svn: 193214
* Using FoldingSet in SelectionDAG::getVTList.Wan Xiaofei2013-10-221-59/+64
| | | | | | | | | | VTList has a long life cycle through the module and getVTList is frequently called. In current getVTList, sequential search over a std::vector is used, this is inefficient in big module. This patch use FoldingSet to implement hashing mechanism when searching. Reviewer: Nadav Rotem Test : Pass unit tests & LNT test suite llvm-svn: 193150
* Fix CodeGen for different size address space GEPsMatt Arsenault2013-10-211-6/+4
| | | | llvm-svn: 193111
* Reuse variableMatt Arsenault2013-10-211-1/+1
| | | | llvm-svn: 193107
* [PATCH] Fix PR17168 (DAG scheduler inserts DBG_VALUE before PHI with fast-isel)Bill Schmidt2013-10-181-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | PR17168 describes a test case that fails when compiling for debug with fast-isel. Investigation showed that the test was failing because a DBG_VALUE machine instruction was placed prior to a PHI. For this problem to occur requires the following: * Compile for debug * Compile with fast-isel * In a block B, fast-isel must partially succeed before punting to DAG-isel * B must start with a PHI * The first unhandled node in the DAG must not generate a machine instruction * A debug value with an order less than that of that first node exists When all of these circumstances apply, the existing test that an instruction was not inserted won't fire. Currently it tests whether the block is empty, or whether the last instruction generated is a phi. When fast-isel has partially succeeded, the last instruction generated will not be a phi. Instead, we need to check whether the current insert position is immediately following a phi. This patch adds that check, and adds the test case from the PR as a regression test. llvm-svn: 192976
* CodeGen: Emit a libcall if the target doesn't support 16-byte wide atomicsDavid Majnemer2013-10-182-0/+16
| | | | | | | | | | There are targets that support i128 sized scalars but cannot emit instructions that modify them directly. The proper thing to do is to emit a libcall. This fixes PR17481. llvm-svn: 192957
* Replace sra with srl if a single sign bit is requiredRichard Sandiford2013-10-171-3/+14
| | | | | | E.g. (and (sra (i32 x) 31) 2) -> (and (srl (i32 x) 30) 2). llvm-svn: 192884
* Fix edge condition in DAGCombiner to improve codegen of shift sequences.Andrea Di Biagio2013-10-171-0/+1
| | | | | | | | | | | | When canonicalizing dags according to the rule (shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1)) remember to add the new shl dag to the DAGCombiner worklist of nodes. If we don't explicitly add it to the worklist of nodes to visit, we may not trigger later on the rule that folds the shift left + logical shift right into a AND instruction with bitmask. llvm-svn: 192883
* [projects/test-suite] White space and long line fixes.Jack Carter2013-10-171-15/+21
| | | | | | No functionality changes. llvm-svn: 192863
* DAGCombiner: Don't fold xor into not if getNOT would introduce an illegal ↵Benjamin Kramer2013-10-161-1/+1
| | | | | | | | | | | constant. This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly because i64 is illegal. It would be nice if getNOT would handle this transparently, but I don't see a way to generate a legal constant there right now. Fixes PR17487. llvm-svn: 192795
* Handle (shl (anyext (shr ...))) in SimpilfyDemandedBitsRichard Sandiford2013-10-161-0/+25
| | | | | | | | | | This is really an extension of the current (shl (shr ...)) -> shl optimization. The main difference is that certain upper bits must also not be demanded. The motivating examples are the first two in the testcase, which occur in llvmpipe output. llvm-svn: 192783
* Fix indenting.David Blaikie2013-10-141-7/+8
| | | | | | That wasn't confusing /at all/... llvm-svn: 192617
* Fixed a bug in dynamic allocation memory on stack.Elena Demikhovsky2013-10-141-3/+3
| | | | | | | | The alignment of allocated space was wrong, see Bugzila 17345. Done by Zvi Rackover <zvi.rackover@intel.com>. llvm-svn: 192573
* TargetLowering: Don't index into empty string.Will Dietz2013-10-131-1/+1
| | | | | | (This is triggered by current lit tests) llvm-svn: 192549
* [DAGCombiner] Reapply load slicing (192471) with a test that explicitly set ↵Quentin Colombet2013-10-111-2/+574
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sse4.2 support. This should fix the buildbots. Original commit message: [DAGCombiner] Slice a big load in two loads when the element are next to each other in memory and the target has paired load and performs post-isel loads combining. E.g., this optimization will transform something like this: a = load i64* addr b = trunc i64 a to i32 c = lshr i64 a, 32 d = trunc i64 c to i32 into: b = load i32* addr1 d = load i32* addr2 Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and performs post-isel loads combining. One should overload TargetLowering::hasPairedLoad to provide this information. The default is false. <rdar://problem/14477220> llvm-svn: 192476
* [DAGCombiner] Revert load slicing (r192471), until I figure out why it fails ↵Quentin Colombet2013-10-111-574/+2
| | | | | | on ubuntu. llvm-svn: 192474
* [DAGCombiner] Slice a big load in two loads when the element are next to eachQuentin Colombet2013-10-111-2/+574
| | | | | | | | | | | | | | | | | | | | | | | | other in memory and the target has paired load and performs post-isel loads combining. E.g., this optimization will transform something like this: a = load i64* addr b = trunc i64 a to i32 c = lshr i64 a, 32 d = trunc i64 c to i32 into: b = load i32* addr1 d = load i32* addr2 Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and performs post-isel loads combining. One should overload TargetLowering::hasPairedLoad to provide this information. The default is false. <rdar://problem/14477220> llvm-svn: 192471
* Use getPointerSizeInBits() rather than 8 * getPointerSize()Matt Arsenault2013-10-101-2/+3
| | | | llvm-svn: 192386
* Fix some assert messages to say the correct opcode name. Looks like one ↵Craig Topper2013-10-061-7/+7
| | | | | | assert got copy and pasted to many places. llvm-svn: 192078
* Add OPC_CheckChildSame0-3 to the DAG isel matcher. This replaces sequences ↵Craig Topper2013-10-051-0/+27
| | | | | | of MoveChild, CheckSame, MoveParent. Saves 846 bytes from the X86 DAG isel matcher, ~300 from ARM, ~840 from Hexagon. llvm-svn: 192026
* Debug info: Don't crash in SelectionDAGISel when a vreg that is beingAdrian Prantl2013-10-051-3/+7
| | | | | | | | pointed to by a dbg_value belonging to a function argument is eliminated during instruction selection. rdar://problem/15094721. llvm-svn: 192011
* Fix DAGCombiner::visitFP_EXTEND to ignore indexed loadsHal Finkel2013-10-041-1/+1
| | | | | | | | | | | | | | | | DAGCombiner::visitFP_EXTEND will apply the following transformation: fold (fpext (load x)) -> (fpext (fptrunc (extload x))) but the implementation does not handle indexed loads (pre/post inc.), but did not specifically ignore them either (unlike for extending loads, which it already ignored), causing an assert when the transformation was applied to an indexed load. This is the minimal fix for correctness (causing the transformation to be skipped for indexed loads). Unfortunately, I don't have an in-tree test case. llvm-svn: 191989
* Revert r191940 to see if it fixes the build bots.Craig Topper2013-10-041-27/+0
| | | | llvm-svn: 191941
* Add OPC_CheckChildSame0-3 to the DAG isel matcher. This replaces sequences ↵Craig Topper2013-10-041-0/+27
| | | | | | of MoveChild, CheckSame, MoveParent. Saves 846 bytes from the X86 DAG isel matcher, ~300 from ARM, ~840 from Hexagon. llvm-svn: 191940
* Added checking code whehter target supports specific dag combining about rotateJin-Gu Kang2013-10-031-11/+19
| | | | | | | | | | | | | | | | | | | | | | | or not. The corresponding dag patterns are as following: "DAGCombier::MatchRotate" function in DAGCombiner.cpp Pattern1 // fold (or (shl (*ext x), (*ext y)), // (srl (*ext x), (*ext (sub 32, y)))) -> // (*ext (rotl x, y)) // fold (or (shl (*ext x), (*ext y)), // (srl (*ext x), (*ext (sub 32, y)))) -> // (*ext (rotr x, (sub 32, y))) pattern2 // fold (or (shl (*ext x), (*ext (sub 32, y))), // (srl (*ext x), (*ext y))) -> // (*ext (rotl x, y)) // fold (or (shl (*ext x), (*ext (sub 32, y))), // (srl (*ext x), (*ext y))) -> // (*ext (rotr x, (sub 32, y))) llvm-svn: 191905
* Remove several unused variables.Rafael Espindola2013-10-012-5/+2
| | | | | | Patch by Alp Toker. llvm-svn: 191757
* SelectionDAG: Clarify comments from r191600Tom Stellard2013-10-011-2/+2
| | | | llvm-svn: 191724
* Allocate AtomicSDNode operands in SelectionDAG's allocator to stop leakage.Benjamin Kramer2013-09-291-2/+10
| | | | | | | SDNode destructors are never called. As an optimization use AtomicSDNode's internal storage if we have a small number of operands. llvm-svn: 191636
* Fix spelling intruction -> instruction.Robert Wilhelm2013-09-281-1/+1
| | | | llvm-svn: 191610
* SelectionDAG: Silence unused variable warning on release buildsTom Stellard2013-09-281-0/+1
| | | | llvm-svn: 191604
* SelectionDAG: Improve legalization of SELECT_CC with illegal condition codesTom Stellard2013-09-281-13/+37
| | | | | | | | | | | SelectionDAG will now attempt to inverse an illegal conditon in order to find a legal one and if that doesn't work, it will attempt to swap the operands using the inverted condition. There are no new test cases for this, but a nubmer of the existing R600 tests hit this path. llvm-svn: 191602
* SelectionDAG: Try to expand all condition codes using getCCSwappedOperands()Tom Stellard2013-09-283-18/+33
| | | | | | | | | | | | This is useful for targets like R600, which only support GT, GE, NE, and EQ condition codes as it removes the need to handle unsupported condition codes in target specific code. There are no tests with this commit, but R600 has been updated to take advantage of this new feature, so its existing selectcc tests are now testing the swapped operands path. llvm-svn: 191601
* SelectionDAG: Clean up LegalizeSetCCCondCode() functionTom Stellard2013-09-281-26/+51
| | | | | | | | | | | | | | Interpreting the results of this function is not very intuitive, so I cleaned it up to make it more clear whether or not a SETCC op was legalized and how it was legalized (either by swapping LHS and RHS or replacing with AND/OR). This patch does change functionality in the LHS and RHS swapping case, but unfortunately there are no in-tree tests for this. However, this patch is a prerequisite for R600 to take advantage of the LHS and RHS swapping, so tests will be added in subsequent commits. llvm-svn: 191600
* Re-apply the change from r191393 with fix for pr17380.Andrea Di Biagio2013-09-271-0/+20
| | | | | | | | | | This change fixes the problem reported in pr17380 and re-add the dagcombine transformation ensuring that the value types are always legal if the transformation is triggered after Legalization took place. Added the test case from pr17380. llvm-svn: 191509
* Revert r191393 since it caused pr17380.Andrea Di Biagio2013-09-261-20/+0
| | | | llvm-svn: 191438
* [ARM] Use the load-acquire/store-release instructions optimally in AArch32.Amara Emerson2013-09-261-48/+26
| | | | | | Patch by Artyom Skrobov. llvm-svn: 191428
* Added temp flag -misched-bench for staging in default changes.Andrew Trick2013-09-261-1/+1
| | | | llvm-svn: 191423
* whitespaceAndrew Trick2013-09-261-2/+2
| | | | llvm-svn: 191422
* Teach DAGCombiner how to canonicalize dags according to the ruleAndrea Di Biagio2013-09-251-0/+20
| | | | | | | | | | | | (shl (zext (shr A, X)), X) => (zext (shl (shr A, X), X)). The rule only triggers when there are no other uses of the zext to avoid materializing more instructions. This helps the DAGCombiner understand that the shl/shr sequence can then be converted into an and instruction. llvm-svn: 191393
OpenPOWER on IntegriCloud