summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* Add minnum / maxnum codegenMatt Arsenault2014-10-219-0/+166
| | | | llvm-svn: 220342
* Introduce enum values for previously defined metadata types. (NFC)Philip Reames2014-10-212-5/+5
| | | | | | | | | | | Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248
* Check for dynamic alloca's when selecting lifetime intrinsics.Pete Cooper2014-10-171-1/+7
| | | | | | | | | | | | | | | | | | TL;DR: Indexing maps with [] creates missing entries. The long version: When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca. On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots. PEI would later trigger an assert because it expects the stack protector to not be dead. This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken. rdar://problem/18672951 llvm-svn: 220099
* [Stackmaps] Enable invoking the patchpoint intrinsic.Juergen Ributzka2014-10-172-51/+64
| | | | | | | | | | | Patch by Kevin Modzelewski Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits, reames Differential Revision: http://reviews.llvm.org/D5634 llvm-svn: 220055
* SelectionDAG: Add sext_inreg optimizationsJan Vesely2014-10-171-0/+22
| | | | | | | | | | v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044
* Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.Juergen Ributzka2014-10-162-44/+58
| | | | | | | | | | | | This is in preparation for another patch that makes patchpoints invokable. Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5657 llvm-svn: 219967
* Erase fence insertion from SelectionDAGBuilder.cpp (NFC)Robin Morisset2014-10-161-67/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
* constify the getters in SDNodeDbgValue.Adrian Prantl2014-10-131-12/+12
| | | | llvm-svn: 219627
* Refactor debug statement and remove dead argument. NFC.Chad Rosier2014-10-132-18/+13
| | | | llvm-svn: 219626
* Modernize old-style static asserts. NFC.Benjamin Kramer2014-10-121-1/+1
| | | | llvm-svn: 219588
* Improve sqrt estimate algorithm (fast-math)Sanjay Patel2014-10-091-17/+16
| | | | | | | | | | | | | | | | | | | This patch changes the fast-math implementation for calculating sqrt(x) from: y = 1 / (1 / sqrt(x)) to: y = x * (1 / sqrt(x)) This has 2 benefits: less code / faster code and one less estimate instruction that may lose precision. The only target that will be affected (until http://reviews.llvm.org/D5658 is approved) is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf or vector sqrtf and 4 less flops for a double-precision sqrt. We also eliminate a constant load and extra register usage. Differential Revision: http://reviews.llvm.org/D5682 llvm-svn: 219445
* Remove more calls to getSubtargetImpl from the schedulers andEric Christopher2014-10-093-24/+17
| | | | | | remove cached or unnecessary TargetMachines. llvm-svn: 219387
* Remove unused argument to CreateTargetScheduleState and changeEric Christopher2014-10-091-1/+1
| | | | | | | the TargetMachine to a TargetSubtargetInfo since everything we wanted is off of that. llvm-svn: 219382
* Remove uses of getSubtargetImpl from ResourcePriorityQueue andEric Christopher2014-10-091-7/+5
| | | | | | replace them with calls off of the MachineFuncton. llvm-svn: 219381
* Remove the uses of getSubtargetImpl from InstrEmitter and removeEric Christopher2014-10-092-9/+6
| | | | | | the now unused TargetMachine variable. llvm-svn: 219379
* Use the subtarget on the dag to get TargetFrameLowering ratherEric Christopher2014-10-092-2/+2
| | | | | | than off the target machine. llvm-svn: 219378
* Remove uses of the TargetMachine from FunctionLoweringInfoEric Christopher2014-10-092-15/+11
| | | | | | via caching TargetLowering and using the MachineFunction. llvm-svn: 219375
* Remove unnecessary include.Eric Christopher2014-10-081-1/+0
| | | | llvm-svn: 219368
* Use both the cached TLI and the subtarget off of the DAG inEric Christopher2014-10-081-15/+10
| | | | | | the DAG combiner. llvm-svn: 219367
* Remove getSubtargetImpl calls from FastISel, we can get it fromEric Christopher2014-10-081-6/+5
| | | | | | the MachineFunction where it's already cached. llvm-svn: 219366
* Remove dead call to getTypeToTransformTo. The result isEric Christopher2014-10-081-3/+1
| | | | | | unused. llvm-svn: 219347
* Remove a bunch of getSubtargetImpl calls since we already haveEric Christopher2014-10-081-35/+6
| | | | | | a cached TLI instance. llvm-svn: 219342
* Use the TargetLowering information we already have on theEric Christopher2014-10-081-305/+256
| | | | | | | SelectionDAG in SelectionDAGBuilder rather than going through the TargetMachine for lookup. llvm-svn: 219292
* Grab the TargetRegisterInfo off of the subtarget from theEric Christopher2014-10-081-1/+1
| | | | | | | MachineFunction rather than a lookup on the TargetMachine to avoid unnecessary lookups. llvm-svn: 219291
* Cache TargetLowering on SelectionDAGISel and update previousEric Christopher2014-10-084-28/+26
| | | | | | calls to getTargetLowering() with the cached variable. llvm-svn: 219284
* Cache SelectionDAGISel TargetInstrInfo lookups on the class andEric Christopher2014-10-081-13/+9
| | | | | | | propagate. Also use the TargetSubtargetInfo and the MachineFunction and move TargetRegisterInfo query closer to uses. llvm-svn: 219273
* Reset the target options and optimization level as the firstEric Christopher2014-10-081-8/+13
| | | | | | | | | thing we do inside selection dag. This code needs to be migrated to queries on the function rather than global data, but this organizes things before we start grabbing the subtarget. llvm-svn: 219271
* Have the selection dag grab TargetLowering off of the subtargetEric Christopher2014-10-082-4/+3
| | | | | | inside init rather than have it passed in as an argument. llvm-svn: 219270
* Have SelectionDAG's subtarget TargetSelectionDAGInfo be setEric Christopher2014-10-081-2/+2
| | | | | | during init rather than construction time. llvm-svn: 219262
* typosSanjay Patel2014-10-071-2/+2
| | | | llvm-svn: 219221
* typosSanjay Patel2014-10-071-1/+1
| | | | llvm-svn: 219220
* [DAGCombine] Remove SIGN_EXTEND-related inf-loopHal Finkel2014-10-061-6/+2
| | | | | | | | | | | | | | | | | | | | | | The patch's author points out that, despite the function's documentation, getSetCCResultType is only used to get the SETCC result type (with one here-removed problematic exception). In one case, getSetCCResultType was being used to get the predicate type to use for a SELECT node, and then SIGN_EXTENDing (or truncating) to get the input predicate to match that type. Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the extension/truncation seems unnecessary here: SELECT is defined as: Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1 then the high bits must conform to getBooleanContents. So here we remove this use of getSetCCResultType and update getSetCCResultType's documentation to reflect its actual uses. Patch by deadal nix! llvm-svn: 219141
* Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)Sanjay Patel2014-10-061-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c: float distance = sqrt(dx * dx + dy * dy + dz * dz); float mag = dt / (distance * distance * distance); Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces: addis 3, 2, .LCPI4_2@toc@ha lfs 4, .LCPI4_2@toc@l(3) addis 3, 2, .LCPI4_1@toc@ha lfs 0, .LCPI4_1@toc@l(3) fcmpu 0, 1, 4 beq 0, .LBB4_2 # BB#1: frsqrtes 4, 1 addis 3, 2, .LCPI4_0@toc@ha lfs 5, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 5, 1 fmuls 6, 4, 4 fmadds 1, 13, 6, 5 fmuls 1, 4, 1 fres 4, 1 <--- reciprocal of reciprocal square root fnmsubs 1, 1, 4, 0 fmadds 4, 4, 1, 4 .LBB4_2: fmuls 1, 4, 2 fres 2, 1 fnmsubs 0, 1, 2, 0 fmadds 0, 2, 0, 2 fmuls 1, 3, 0 blr After the patch, this simplifies to: frsqrtes 0, 1 addis 3, 2, .LCPI4_1@toc@ha fres 5, 2 lfs 4, .LCPI4_1@toc@l(3) addis 3, 2, .LCPI4_0@toc@ha lfs 7, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 4, 1 fmuls 6, 0, 0 fnmsubs 2, 2, 5, 7 fmadds 1, 13, 6, 4 fmadds 2, 5, 2, 5 fmuls 0, 0, 1 fmuls 0, 0, 2 fmuls 1, 3, 0 blr Differential Revision: http://reviews.llvm.org/D5628 llvm-svn: 219139
* [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffleChandler Carruth2014-10-051-0/+93
| | | | | | | | | | | | | | | that are unused. This allows the combiner to delete math feeding shuffles where the math isn't actually necessary. This improves some of the vperm2x128 tests that regressed when the vector shuffle lowering started actually generating vperm instructions rather than forcibly decomposing them. Sadly, this isn't enough to get this *really* right because we still form a completely unnecessary permutation. To fix that, we also need to fold shuffles which just rearrange concatenated or inserted subvectors. llvm-svn: 219086
* Remove unnecessary copying or replace it with moves in a bunch of places.Benjamin Kramer2014-10-043-5/+6
| | | | | | NFC. llvm-svn: 219061
* [ISel] Keep matching state consistent when folding during X86 address matchAdam Nemet2014-10-031-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
* Eliminate some deep std::vector copies. NFC.Benjamin Kramer2014-10-031-6/+3
| | | | llvm-svn: 218999
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-018-81/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
* Revert r218778 while investigating buldbot breakage.Adrian Prantl2014-10-018-98/+81
| | | | | | "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-018-81/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
* Use the target-specified iteration count to opt out of any further ↵Sanjay Patel2014-09-301-60/+62
| | | | | | refinement of an estimate. NFC. llvm-svn: 218700
* Split the estimate() interface into separate functions for each type. NFC.Sanjay Patel2014-09-301-2/+2
| | | | | | | | | | | | It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
* [DAG] Check in advance if a build_vector has a legal type before attempting ↵Andrea Di Biagio2014-09-301-4/+4
| | | | | | | | | | | | | | to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677
* [AArch64] Redundant store instructions should be removed as dead codeJames Molloy2014-09-271-0/+11
| | | | | | | | | | | | | | | If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
* Refactor reciprocal and reciprocal square root estimate into ↵Sanjay Patel2014-09-261-28/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
* Revert patch ofr218493David Xu2014-09-261-14/+0
| | | | llvm-svn: 218494
* Redundant store instructions should be removed as dead codeDavid Xu2014-09-261-0/+14
| | | | llvm-svn: 218493
* Move resetTargetOptions from taking a MachineFunction to a FunctionEric Christopher2014-09-261-1/+1
| | | | | | | since we are accessing the TargetMachine that we're a member function of. llvm-svn: 218489
* SelectionDAG: Remove #if NDEBUG from check for a post-isel hookTom Stellard2014-09-251-2/+0
| | | | | | | | | | | | | | The InstrEmitter will skip the check of MI.hasPostISelHook() before calling AdjustInstrPostInstrSelection() when NDEBUG is not defined. This was added in r140228, and I'm not sure if it is intentional or not, but it is a likely source for bugs, because it means with Release+Asserts builds you can forget to set the hasPostISelHook flag on TableGen definitions and AdjustInstrPostInstrSelection() will still be called. llvm-svn: 218458
* Clear PreferredExtendType for in each function-specific state ↵Jiangning Liu2014-09-241-0/+1
| | | | | | FunctionLoweringInfo. llvm-svn: 218364
OpenPOWER on IntegriCloud