summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* Grab the TargetRegisterInfo off of the subtarget from theEric Christopher2014-10-081-1/+1
| | | | | | | MachineFunction rather than a lookup on the TargetMachine to avoid unnecessary lookups. llvm-svn: 219291
* Cache TargetLowering on SelectionDAGISel and update previousEric Christopher2014-10-084-28/+26
| | | | | | calls to getTargetLowering() with the cached variable. llvm-svn: 219284
* Cache SelectionDAGISel TargetInstrInfo lookups on the class andEric Christopher2014-10-081-13/+9
| | | | | | | propagate. Also use the TargetSubtargetInfo and the MachineFunction and move TargetRegisterInfo query closer to uses. llvm-svn: 219273
* Reset the target options and optimization level as the firstEric Christopher2014-10-081-8/+13
| | | | | | | | | thing we do inside selection dag. This code needs to be migrated to queries on the function rather than global data, but this organizes things before we start grabbing the subtarget. llvm-svn: 219271
* Have the selection dag grab TargetLowering off of the subtargetEric Christopher2014-10-082-4/+3
| | | | | | inside init rather than have it passed in as an argument. llvm-svn: 219270
* Have SelectionDAG's subtarget TargetSelectionDAGInfo be setEric Christopher2014-10-081-2/+2
| | | | | | during init rather than construction time. llvm-svn: 219262
* typosSanjay Patel2014-10-071-2/+2
| | | | llvm-svn: 219221
* typosSanjay Patel2014-10-071-1/+1
| | | | llvm-svn: 219220
* [DAGCombine] Remove SIGN_EXTEND-related inf-loopHal Finkel2014-10-061-6/+2
| | | | | | | | | | | | | | | | | | | | | | The patch's author points out that, despite the function's documentation, getSetCCResultType is only used to get the SETCC result type (with one here-removed problematic exception). In one case, getSetCCResultType was being used to get the predicate type to use for a SELECT node, and then SIGN_EXTENDing (or truncating) to get the input predicate to match that type. Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the extension/truncation seems unnecessary here: SELECT is defined as: Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1 then the high bits must conform to getBooleanContents. So here we remove this use of getSetCCResultType and update getSetCCResultType's documentation to reflect its actual uses. Patch by deadal nix! llvm-svn: 219141
* Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)Sanjay Patel2014-10-061-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c: float distance = sqrt(dx * dx + dy * dy + dz * dz); float mag = dt / (distance * distance * distance); Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces: addis 3, 2, .LCPI4_2@toc@ha lfs 4, .LCPI4_2@toc@l(3) addis 3, 2, .LCPI4_1@toc@ha lfs 0, .LCPI4_1@toc@l(3) fcmpu 0, 1, 4 beq 0, .LBB4_2 # BB#1: frsqrtes 4, 1 addis 3, 2, .LCPI4_0@toc@ha lfs 5, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 5, 1 fmuls 6, 4, 4 fmadds 1, 13, 6, 5 fmuls 1, 4, 1 fres 4, 1 <--- reciprocal of reciprocal square root fnmsubs 1, 1, 4, 0 fmadds 4, 4, 1, 4 .LBB4_2: fmuls 1, 4, 2 fres 2, 1 fnmsubs 0, 1, 2, 0 fmadds 0, 2, 0, 2 fmuls 1, 3, 0 blr After the patch, this simplifies to: frsqrtes 0, 1 addis 3, 2, .LCPI4_1@toc@ha fres 5, 2 lfs 4, .LCPI4_1@toc@l(3) addis 3, 2, .LCPI4_0@toc@ha lfs 7, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 4, 1 fmuls 6, 0, 0 fnmsubs 2, 2, 5, 7 fmadds 1, 13, 6, 4 fmadds 2, 5, 2, 5 fmuls 0, 0, 1 fmuls 0, 0, 2 fmuls 1, 3, 0 blr Differential Revision: http://reviews.llvm.org/D5628 llvm-svn: 219139
* [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffleChandler Carruth2014-10-051-0/+93
| | | | | | | | | | | | | | | that are unused. This allows the combiner to delete math feeding shuffles where the math isn't actually necessary. This improves some of the vperm2x128 tests that regressed when the vector shuffle lowering started actually generating vperm instructions rather than forcibly decomposing them. Sadly, this isn't enough to get this *really* right because we still form a completely unnecessary permutation. To fix that, we also need to fold shuffles which just rearrange concatenated or inserted subvectors. llvm-svn: 219086
* Remove unnecessary copying or replace it with moves in a bunch of places.Benjamin Kramer2014-10-043-5/+6
| | | | | | NFC. llvm-svn: 219061
* [ISel] Keep matching state consistent when folding during X86 address matchAdam Nemet2014-10-031-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
* Eliminate some deep std::vector copies. NFC.Benjamin Kramer2014-10-031-6/+3
| | | | llvm-svn: 218999
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-018-81/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
* Revert r218778 while investigating buldbot breakage.Adrian Prantl2014-10-018-98/+81
| | | | | | "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-018-81/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
* Use the target-specified iteration count to opt out of any further ↵Sanjay Patel2014-09-301-60/+62
| | | | | | refinement of an estimate. NFC. llvm-svn: 218700
* Split the estimate() interface into separate functions for each type. NFC.Sanjay Patel2014-09-301-2/+2
| | | | | | | | | | | | It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
* [DAG] Check in advance if a build_vector has a legal type before attempting ↵Andrea Di Biagio2014-09-301-4/+4
| | | | | | | | | | | | | | to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677
* [AArch64] Redundant store instructions should be removed as dead codeJames Molloy2014-09-271-0/+11
| | | | | | | | | | | | | | | If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
* Refactor reciprocal and reciprocal square root estimate into ↵Sanjay Patel2014-09-261-28/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
* Revert patch ofr218493David Xu2014-09-261-14/+0
| | | | llvm-svn: 218494
* Redundant store instructions should be removed as dead codeDavid Xu2014-09-261-0/+14
| | | | llvm-svn: 218493
* Move resetTargetOptions from taking a MachineFunction to a FunctionEric Christopher2014-09-261-1/+1
| | | | | | | since we are accessing the TargetMachine that we're a member function of. llvm-svn: 218489
* SelectionDAG: Remove #if NDEBUG from check for a post-isel hookTom Stellard2014-09-251-2/+0
| | | | | | | | | | | | | | The InstrEmitter will skip the check of MI.hasPostISelHook() before calling AdjustInstrPostInstrSelection() when NDEBUG is not defined. This was added in r140228, and I'm not sure if it is intentional or not, but it is a likely source for bugs, because it means with Release+Asserts builds you can forget to set the hasPostISelHook flag on TableGen definitions and AdjustInstrPostInstrSelection() will still be called. llvm-svn: 218458
* Clear PreferredExtendType for in each function-specific state ↵Jiangning Liu2014-09-241-0/+1
| | | | | | FunctionLoweringInfo. llvm-svn: 218364
* Use SDValue bool operator to reduce code. No functional change.Sanjay Patel2014-09-231-9/+6
| | | | llvm-svn: 218314
* Refactor reciprocal square root estimate into target-independent function; NFC.Sanjay Patel2014-09-211-17/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219
* Fix crash with an insertvalue that produces an empty object.Peter Collingbourne2014-09-201-0/+6
| | | | llvm-svn: 218171
* Optionally enable more-aggressive FMA formation in DAGCombineHal Finkel2014-09-191-5/+10
| | | | | | | | | | | | | | | | | The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120
* Optimize sext/zext insertion algorithm in back-end.Jiangning Liu2014-09-193-8/+61
| | | | | | | | With this optimization, we will not always insert zext for values crossing basic blocks, but insert sext if the users of a value crossing basic block has preference of sign predicate. llvm-svn: 218101
* Replace repeated null checks with an assert. NFC.Sanjay Patel2014-09-151-18/+14
| | | | | | | Without a vector to hold the created ops, these functions don't have any use. llvm-svn: 217831
* [FastISel] Move optimizeCmpPredicate to FastISel base class. NFC.Juergen Ributzka2014-09-151-0/+40
| | | | | | Make the optimizeCmpPredicate function available to all targets. llvm-svn: 217822
* Replace dead links to "Hacker's Delight" with general references. NFC.Sanjay Patel2014-09-153-10/+10
| | | | llvm-svn: 217814
* Allow targets to custom legalize vector insertion and extraction.Owen Anderson2014-09-121-0/+8
| | | | llvm-svn: 217711
* Legalizer: Use the scalar bit width when promoting bit counting instrs onBenjamin Kramer2014-09-121-5/+6
| | | | | | | | | vectors. e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup the result by 32 bits, not 64. PR20917. llvm-svn: 217671
* Add DAG combine for shl + add of constants.Matt Arsenault2014-09-111-32/+12
| | | | | | | | | | | | | | Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610
* Combine fmul vector FP constants when unsafe math is allowed.Sanjay Patel2014-09-111-6/+22
| | | | | | | | | | | | | | | | | | | This is an extension of the change made with r215820: http://llvm.org/viewvc/llvm-project?view=revision&revision=215820 That patch allowed combining of splatted vector FP constants that are multiplied. This patch allows combining non-uniform vector FP constants too by relaxing the check on the type of vector. Also, canonicalize a vector fmul in the same way that we already do for scalars - if only one operand of the fmul is a constant, make it operand 1. Otherwise, we miss potential folds. This fold is also done by -instcombine, but it's possible that extra fmuls may have been generated during lowering. Differential Revision: http://reviews.llvm.org/D5254 llvm-svn: 217599
* Build correct vector filled with undef nodesDavid Xu2014-09-111-4/+20
| | | | llvm-svn: 217570
* Fast-ISel: Remove dead code after falling back from selecting call ↵Hans Wennborg2014-09-081-15/+10
| | | | | | | | | | | | | | | | | instructions (PR20863) Previously, fast-isel would not clean up after failing to select a call instruction, because it would have called flushLocalValueMap() which moves the insertion point, making SavedInsertPt in selectInstruction() invalid. Fixing this by making SavedInsertPt a member variable, and having flushLocalValueMap() update it. This removes some redundant code at -O0, and more importantly fixes PR20863. Differential Revision: http://reviews.llvm.org/D5249 llvm-svn: 217401
* Group unsafe fmul math folds together for easier reading. No functional change.Sanjay Patel2014-09-081-6/+10
| | | | llvm-svn: 217399
* Fix the FIXME that was just added in r217390 - remove a bunch of redundant ↵Sanjay Patel2014-09-081-43/+2
| | | | | | | | fold permutations. The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll. llvm-svn: 217393
* group unsafe math folds together for easier readingSanjay Patel2014-09-081-150/+142
| | | | | | Also added a FIXME regarding redundant folds for non-canonicalized constants. llvm-svn: 217390
* Allow vector fsub ops with constants to get the same optimizations as scalars.Sanjay Patel2014-09-051-2/+2
| | | | | | | | This problem is bigger than just fsub, but this is the minimum fix to solve fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve zero subtraction with the same change. llvm-svn: 217286
* clean up; NFCSanjay Patel2014-09-051-2/+2
| | | | llvm-svn: 217278
* [FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.Juergen Ributzka2014-09-031-50/+50
| | | | | | | | | | This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric llvm-svn: 217075
* [FastISel] Rename public visible FastISel functions. NFC.Juergen Ributzka2014-09-032-46/+44
| | | | | | | | | | | | | | | | | | | | | This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric llvm-svn: 217074
* Remove resetSubtargetFeatures as it is unused.Eric Christopher2014-09-031-3/+0
| | | | llvm-svn: 217071
* [FastISel] Some long overdue spring cleaning of FastISel.Juergen Ributzka2014-09-031-378/+333
| | | | | | | | | | | | | Things got a little bit messy over the years and it is time for a little bit spring cleaning. This first commit is focused on the FastISel base class itself. It doxyfies all comments, C++11fies the code where it makes sense, renames internal methods to adhere to the coding standard, and clang-formats the files. Reviewed by Eric llvm-svn: 217060
OpenPOWER on IntegriCloud