summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* Add a hasOperandBundlesOtherThan helper, and use it; NFCSanjoy Das2016-03-221-12/+6
| | | | llvm-svn: 264072
* [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardwareSimon Pilgrim2016-03-222-0/+102
| | | | | | | | | | | | | | Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Reapplied with a fix for PR26953 (missing vector widening legalization). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 264062
* Add "first class" lowering for deopt operand bundlesSanjoy Das2016-03-224-24/+99
| | | | | | | | | | | | | | | | | Summary: After this change, deopt operand bundles can be lowered directly by SelectionDAG into STATEPOINT instructions (which are then lowered to a call or sequence of nop, with an associated __llvm_stackmaps entry0. This obviates the need to round-trip deoptimization state through gc.statepoint via RewriteStatepointsForGC. Reviewers: reames, atrick, majnemer, JosephTremoulet, pgavlin Subscribers: sanjoy, mcrosier, majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D18257 llvm-svn: 264015
* [DAGCombine] Catch the case where extract_vector_elt can cause an any_ext ↵Silviu Baranga2016-03-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | while processing AND SDNodes Summary: extract_vector_elt can cause an implicit any_ext if the types don't match. When processing the following pattern: (and (extract_vector_elt (load ([non_ext|any_ext|zero_ext] V))), c) DAGCombine was ignoring the possible extend, and sometimes removing the AND even though it was required to maintain some of the bits in the result to 0, resulting in a miscompile. This change fixes the issue by limiting the transformation only to cases where the extract_vector_elt doesn't perform the implicit extend. Reviewers: t.p.northover, jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18247 llvm-svn: 263935
* [CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.Manman Ren2016-03-181-1/+1
| | | | | | | Since at O0, explicit copies via SplitCSR may not be removed even if they are unnecessary, we choose not to use SplitCSR at O0. llvm-svn: 263855
* [SelectionDAG] Remove visitStatepoint; NFCSanjoy Das2016-03-173-11/+2
| | | | | | | This way we have a single entry point into StatepointLowering. The method was a direct dispatch to LowerStatepoint anyway. llvm-svn: 263682
* Fix indentation; NFCSanjoy Das2016-03-161-3/+2
| | | | llvm-svn: 263672
* Extract out a SelectionDAGBuilder::LowerAsStatepoint; NFCSanjoy Das2016-03-162-144/+197
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a step towards implementing "direct" lowering of calls and invokes with deopt operand bundles into STATEPOINT nodes (as opposed to having them mandatorily pass through RewriteStatepointsForGC, which is the case today). This change extracts out a `SelectionDAGBuilder::LowerAsStatepoint` helper function that is able to lower a "statepoint like thing", and uses it to lower `gc.statepoint` calls. This is an NFC now, but in a later change we will use `LowerAsStatepoint` to directly lower calls and invokes with operand bundles without going through an intermediate `gc.statepoint` IR representation. FYI: I expect `SelectionDAGBuilder::StatepointInfo` will evolve as I add support for lowering non gc.statepoints, right now it is fairly tightly coupled with an IR level `gc.statepoint`. Reviewers: reames, pgavlin, JosephTremoulet Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18106 llvm-svn: 263671
* Tweak some atomics functions in preparation for larger changes; NFC.James Y Knight2016-03-162-2/+2
| | | | | | | | | | | | | | | | - Rename getATOMIC to getSYNC, as llvm will soon be able to emit both '__sync' libcalls and '__atomic' libcalls, and this function is for the '__sync' ones. - getInsertFencesForAtomic() has been replaced with shouldInsertFencesForAtomic(Instruction), so that the decision can be made per-instruction. This functionality will be used soon. - emitLeadingFence/emitTrailingFence are no longer called if shouldInsertFencesForAtomic returns false, and thus don't need to check the condition themselves. llvm-svn: 263665
* [SelectionDAG] Extract out populateCallLoweringInfo; NFCSanjoy Das2016-03-163-30/+31
| | | | | | | | | SelectionDAGBuilder::populateCallLoweringInfo is now used instead of SelectionDAGBuilder::lowerCallOperands. The populateCallLoweringInfo interface is more composable in face of design changes like http://reviews.llvm.org/D18106 llvm-svn: 263663
* Removed trailing whitespaceSimon Pilgrim2016-03-161-12/+12
| | | | llvm-svn: 263650
* [StatepointLowering] Move an assertion; NFCISanjoy Das2016-03-151-6/+4
| | | | | | | | Instead of running an explicit loop over `gc.relocate` calls hanging off of a `gc.statepoint`, assert the validity of the type of the value being relocated in `visitRelocate`. llvm-svn: 263516
* Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND onEric Christopher2016-03-142-40/+0
| | | | | | | | | pre-SSE41 hardware" as it seems to be causing crashes during code generation in halide. PR forthcoming. This reverts commit r263303. llvm-svn: 263512
* [DAG] use !isUndef() ; NFCISanjay Patel2016-03-144-13/+11
| | | | llvm-svn: 263453
* [DAG] use isUndef() ; NFCISanjay Patel2016-03-145-98/+94
| | | | llvm-svn: 263448
* Make gc relocates more strongly typed; NFCSanjoy Das2016-03-121-10/+13
| | | | | | | Don't use a `Value *` where we can use a stronger `GCRelocateInst *` type. llvm-svn: 263327
* [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardwareSimon Pilgrim2016-03-112-0/+40
| | | | | | | | | | | | Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 263303
* [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ↵Simon Pilgrim2016-03-102-2/+19
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159
* SelectionDAG: Fix a crash on inline asm when output register supports ↵Tom Stellard2016-03-091-3/+7
| | | | | | | | | | | | | | | | multiple types Summary: The code in SelectionDAG did not handle the case where the register type and output types were different, but had the same size. Reviewers: arsenm, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17940 llvm-svn: 263022
* Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ↵Hans Wennborg2016-03-081-18/+1
| | | | | | | | ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935
* Re-apply "SelectionDAG: Store SDNode operands in an ArrayRecycler"Justin Bogner2016-03-081-143/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-applies r262886 with a fix for 32 bit platforms that have 8 byte pointer alignment, effectively reverting r262892. Original Message: Currently some SDNode operands are malloc'd, some are stored inline in subclasses of SDNode, and some are thrown into a BumpPtrAllocator. This scheme is complex, inconsistent, and makes refactoring SDNodes fairly difficult. Instead, we can allocate all of the operands using an ArrayRecycler that wraps a BumpPtrAllocator. This keeps the cache locality when iterating operands, improves locality when iterating SDNodes without looking at operands, and vastly simplifies the ownership semantics. It also means we stop overallocating SDNodes by 2-3x and will make it simpler to fix the rampant undefined behaviour we have in how we mutate SDNodes from one kind to another (See llvm.org/pr26808). This is NFC other than the changes in memory behaviour, and I ran some LNT tests to make sure this didn't hurt compile time. Not many tests changed: there were a couple of 1-2% regressions reported, but there were more improvements (of up to 4%) than regressions. llvm-svn: 262902
* Revert "SelectionDAG: Store SDNode operands in an ArrayRecycler"Justin Bogner2016-03-081-118/+143
| | | | | | | | | Looks like the largest SDNode is different between 32 and 64 bit now, so this is breaking 32 bit bots. Reverting while I figure out a fix. This reverts r262886. llvm-svn: 262892
* SelectionDAG: Store SDNode operands in an ArrayRecyclerJustin Bogner2016-03-081-143/+118
| | | | | | | | | | | | | | | | | | | | | | | Currently some SDNode operands are malloc'd, some are stored inline in subclasses of SDNode, and some are thrown into a BumpPtrAllocator. This scheme is complex, inconsistent, and makes refactoring SDNodes fairly difficult. Instead, we can allocate all of the operands using an ArrayRecycler that wraps a BumpPtrAllocator. This keeps the cache locality when iterating operands, improves locality when iterating SDNodes without looking at operands, and vastly simplifies the ownership semantics. It also means we stop overallocating SDNodes by 2-3x and will make it simpler to fix the rampant undefined behaviour we have in how we mutate SDNodes from one kind to another (See llvm.org/pr26808). This is NFC other than the changes in memory behaviour, and I ran some LNT tests to make sure this didn't hurt compile time. Not many tests changed: there were a couple of 1-2% regressions reported, but there were more improvements (of up to 4%) than regressions. llvm-svn: 262886
* DAGCombiner: Check legality before creating extract_vector_eltMatt Arsenault2016-03-071-1/+3
| | | | | | Problem not hit by any in tree target. llvm-svn: 262852
* [CodeGen] Add space-optimized EmitMergeInputChains1_2 to the DAG isel ↵Craig Topper2016-03-071-2/+3
| | | | | | matching tables. Shaves about 5100 bytes from the X86 matcher table. NFC llvm-svn: 262815
* [DAGCombine] Fix divrem combine not to assume div/rem type is simple.Michael Kuperstein2016-03-041-1/+4
| | | | | | | | | | | | | The divrem combine assumed the type of the div/rem is simple, which isn't necessarily true. This probably worked fine until r250825, since it only saw legal types, but now breaks when it runs as a pre-type-legalization combine. This fixes PR26835. Differential Revision: http://reviews.llvm.org/D17878 llvm-svn: 262746
* [ARM] Merging 64-bit divmod lib calls into oneRenato Golin2016-03-041-4/+8
| | | | | | | | | | | | | | | | | | | | | When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make sure we only emit valid types or the ones that were explicitly marked as custom. Now, passing check-all and test-suite on x86, ARM and AArch64. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262738
* [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREGSimon Pilgrim2016-03-031-1/+18
| | | | | | | | Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 262599
* Revert "[ARM] Merging 64-bit divmod lib calls into one"Renato Golin2016-03-031-2/+1
| | | | | | This reverts commit r262507, which broke some ARM buildbots. llvm-svn: 262594
* [X86] Don't give catch objects a displacement of zeroDavid Majnemer2016-03-031-20/+40
| | | | | | | | | | | | | | | | | | Catch objects with a displacement of zero do not initialize a catch object. The displacement is relative to %rsp at the end of the function's prologue for x86_64 targets. If we place an object at the top-of-stack, we will end up wit a displacement of zero resulting in our catch object remaining uninitialized. Address this by creating our catch objects as fixed objects. We will ensure that the UnwindHelp object is created after the catch objects so that no catch object will have a displacement of zero. Differential Revision: http://reviews.llvm.org/D17823 llvm-svn: 262546
* [ARM] Merging 64-bit divmod lib calls into oneRenato Golin2016-03-021-1/+2
| | | | | | | | | | | | | | | | | When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262507
* SelectionDAG: Use correctly sized allocation functions for SDNodesJustin Bogner2016-03-021-116/+86
| | | | | | | | | | | | | | | | The placement new calls here were all calling the allocation function in RecyclingAllocator/Recycler for SDNode, instead of the function for the specific subclass we were constructing. Since this particular allocator always overallocates it more or less worked, but would hide what we're actually doing from any memory tools. Also, if you tried to change this allocator so something like a BumpPtrAllocator or MallocAllocator, the compiler would crash horribly all the time. Part of llvm.org/PR26808. llvm-svn: 262500
* DAGCombiner: Make sure an integer is being truncatedMatt Arsenault2016-03-021-1/+1
| | | | llvm-svn: 262446
* DAGCombiner: Turn truncate of a bitcasted vector to an extractMatt Arsenault2016-03-011-0/+16
| | | | | | | | | | | On AMDGPU where operations i64 operations are often bitcasted to v2i32 and back, this pattern shows up regularly where it breaks some expected combines on i64, such as load width reducing. This fixes some test failures in a future commit when i64 loads are changed to promote. llvm-svn: 262397
* Revert "[mips] Promote the result of SETCC nodes to GPR width."Vasileios Kalintiris2016-03-014-16/+6
| | | | | | | | | This reverts commit r262316. It seems that my change breaks an out-of-tree chromium buildbot, so I'm reverting this in order to investigate the situation further. llvm-svn: 262387
* [NVPTX] Use different, convergent MIs for convergent calls.Justin Lebar2016-03-011-3/+5
| | | | | | | | | | | | | | | | | | | | | | | Summary: Calls sometimes need to be convergent. This is already handled at the LLVM IR level, but it also needs to be handled at the MI level. Ideally we'd propagate convergence from instructions, down through the selection DAG, and into MIs. But this is Hard, and would affect optimizations in the SDNs -- right now only SDNs with two operands have any flags at all. Instead, here's a much simpler hack: Add new opcodes for NVPTX for convergent calls, and generate these when lowering convergent LLVM calls. Reviewers: jholewinski Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17423 llvm-svn: 262373
* DAGCombiner: Turn extract of bitcasted integer into truncateMatt Arsenault2016-03-011-0/+8
| | | | | | | This reduces the number of bitcast nodes and generally cleans up the DAG when bitcasting between integers and vectors everywhere. llvm-svn: 262358
* [mips] Promote the result of SETCC nodes to GPR width.Vasileios Kalintiris2016-03-014-6/+16
| | | | | | | | | | | | | | | | | | | | Summary: This patch modifies the existing comparison, branch, conditional-move and select patterns, and adds new ones where needed. Also, the updated SLT{u,i,iu} set of instructions generate a GPR width result. The majority of the code changes in the Mips back-end fix the wrong assumption that the result of SETCC nodes always produce an i32 value. The changes in the common code path account for the fact that in 64-bit MIPS targets, i1 is promoted to i32 instead of i64. Reviewers: dsanders Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D10970 llvm-svn: 262316
* LegalizeDAG: Use correct ptr type when expanding unaligned load/storeMatt Arsenault2016-03-011-14/+21
| | | | | | | This fixes regressions exposed in existing AMDGPU tests in a future commit when all loads are custom lowered. llvm-svn: 262299
* DAGCombiner: Don't unnecessarily swap operands in ReassociateOpsMatt Arsenault2016-02-271-2/+2
| | | | | | | | | | | | | | | | | | In the case where op = add, y = base_ptr, and x = offset, this transform: (op y, (op x, c1)) -> (op (op x, y), c1) breaks the canonical form of add by putting the base pointer in the second operand and the offset in the first. This fix is important for the R600 target, because for some address spaces the base pointer and the offset are stored in separate register classes. The old pattern caused the ISel code for matching addressing modes to put the base pointer and offset in the wrong register classes, which required no-trivial code transformations to fix. llvm-svn: 262148
* DAGCombiner: Relax sqrt NaN folding checkMatt Arsenault2016-02-271-7/+7
| | | | | | This is OK for +0 since compares to +/-0 give the same result. llvm-svn: 262125
* Fix a bug in isVectorReductionOp() in SelectionDAGBuilder.cpp that may cause ↵Cong Hou2016-02-261-4/+4
| | | | | | assertion failure on AArch64. llvm-svn: 262091
* Detecte vector reduction operations just before instruction selection.Cong Hou2016-02-241-0/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | (This is the second attemp to commit this patch, after fixing pr26652 & pr26653). This patch detects vector reductions before instruction selection. Vector reductions are vectorized reduction operations, and for such operations we have freedom to reorganize the elements of the result as long as the reduction of them stay unchanged. This will enable some reduction pattern recognition during instruction combine such as SAD/dot-product on X86. A flag is added to SDNodeFlags to mark those vector reduction nodes to be checked during instruction combine. To detect those vector reductions, we search def-use chains starting from the given instruction, and check if all uses fall into two categories: 1. Reduction with another vector. 2. Reduction on all elements. in which 2 is detected by recognizing the pattern that the loop vectorizer generates to reduce all elements in the vector outside of the loop, which includes several ShuffleVector and one ExtractElement instructions. Differential revision: http://reviews.llvm.org/D15250 llvm-svn: 261804
* NFC. Move isDereferenceable to Loads.h/cppArtur Pilipenko2016-02-241-0/+1
| | | | | | | | | | This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated. Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D16180 llvm-svn: 261736
* SelectionDAG: Use correct addrspace when lowering memcpyMatt Arsenault2016-02-221-9/+16
| | | | | | | | | | | This was causing assertions later from using the wrong pointer size with LDS operations. getOptimalMemOpType should also have address space arguments later. This avoids assertions in existing tests exposed by a future commit. llvm-svn: 261580
* ADT: Remove == and != comparisons between ilist iterators and pointersDuncan P. N. Exon Smith2016-02-211-3/+4
| | | | | | | | | | | | | | I missed == and != when I removed implicit conversions between iterators and pointers in r252380 since they were defined outside ilist_iterator. Since they depend on getNodePtrUnchecked(), they indirectly rely on UB. This commit removes all uses of these operators. (I'll delete the operators themselves in a separate commit so that it can be easily reverted if necessary.) There should be NFC here. llvm-svn: 261498
* [DAGCombiner] Use getBitcast helper when possible. NFCI.Simon Pilgrim2016-02-201-7/+3
| | | | llvm-svn: 261437
* [StatepointLowering] Minor non-semantic cleanupsSanjoy Das2016-02-191-23/+18
| | | | | | Use auto, bring file up to coding standards etc. llvm-svn: 261358
* [StatepointLowering] Update StatepointMaxSlotsRequired correctlySanjoy Das2016-02-191-3/+4
| | | | | | | | | | Now that we don't always add an element to AllocatedStackSlots if we don't find a pre-existing unallocated stack slot, bumping StatepointMaxSlotsRequired to `NumSlots + 1` is not correct. Instead bump the statistic near the push_back, to Builder.FuncInfo.StatepointStackSlots.size(). llvm-svn: 261348
* [StatepointLowering] Fix a mistake in rL261336Sanjoy Das2016-02-191-4/+5
| | | | | | | | The check on MFI->getObjectSize() has to be on the FrameIndex, not on the index of the FrameIndex in AllocatedStackSlots. Weirdly, the tests I added in rL261336 didn't catch this. llvm-svn: 261347
OpenPOWER on IntegriCloud