summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-10-131-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151
* Target: Remove unused entities.Peter Collingbourne2016-10-091-1/+0
| | | | llvm-svn: 283690
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-09-281-0/+10
| | | | | | | | UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-09-281-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600
* AMDGPU: Push bitcasts through build_vectorMatt Arsenault2016-09-171-0/+27
| | | | | | | | This reduces the number of copies and reg_sequences when using fp constant vectors. This significantly reduces the code size in local-stack-alloc-bug.ll llvm-svn: 281822
* AMDGPU/SI: Fix kernel argument ABI for HSATom Stellard2016-09-161-1/+2
| | | | | | | | | | | | Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24621 llvm-svn: 281789
* AMDGPU: Refactor kernel argument loweringTom Stellard2016-09-161-34/+97
| | | | | | | | | | | | | | | | | | | Summary: The main challenge in lowering kernel arguments for AMDGPU is determing the memory type of the argument. The generic calling convention code assumes that only legal register types can be stored in memory, but this is not the case for AMDGPU. This consolidates all the logic AMDGPU uses for deducing memory types into a single function. This will make it much easier to support different ABIs in the future. Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24614 llvm-svn: 281781
* AMDGPU: Improve splitting 64-bit bit ops by constantsMatt Arsenault2016-09-141-35/+11
| | | | | | | | This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. llvm-svn: 281488
* AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSATom Stellard2016-09-091-1/+2
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24405 llvm-svn: 281080
* AMDGPU: Fix introducing stack access on unaligned v16i8Matt Arsenault2016-08-311-1/+8
| | | | llvm-svn: 280298
* AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte alignedTom Stellard2016-08-311-1/+1
| | | | | | | | | | | | Summary: This fixes some OpenCV tests that were broken by libclc commit r276443. Reviewers: arsenm, jvesely Subscribers: arsenm, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24051 llvm-svn: 280274
* AMDGPU/R600: Remove MergeVectorStores from legalizationJan Vesely2016-08-291-59/+0
| | | | | | | | This is handled by DAGCombiner in a more generic way Differential Revision: https://reviews.llvm.org/D23970 llvm-svn: 280019
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-271-6/+121
| | | | llvm-svn: 279902
* [X86] Heuristic to selectively build Newton-Raphson SQRT estimationNikolai Bozhenov2016-08-041-2/+0
| | | | | | | | | | | | | | | | | | | | | On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-281-0/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU: Turn dead checks into assertsMatt Arsenault2016-07-281-9/+5
| | | | llvm-svn: 276946
* AMDGPU: Make AMDGPUMachineFunction fields privateMatt Arsenault2016-07-261-19/+3
| | | | | | | | | ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-261-0/+2
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
* AMDGPU: Delete dead codeMatt Arsenault2016-07-251-8/+0
| | | | llvm-svn: 276675
* AMDGPU: Delete dead codeMatt Arsenault2016-07-231-93/+0
| | | | | | This has been dead since r269479 llvm-svn: 276518
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-191-0/+1
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
* AMDGPU: Fix missing switch case warningMatt Arsenault2016-07-181-0/+1
| | | | llvm-svn: 275873
* AMDGPU: Remove brev intrinsicMatt Arsenault2016-07-151-3/+0
| | | | llvm-svn: 275620
* AMDGPU: Remove AMDGPU.ldexpMatt Arsenault2016-07-151-4/+0
| | | | llvm-svn: 275618
* [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, ↵Justin Lebar2016-07-151-42/+24
| | | | | | | | | | | | | | | | | | | | | | | getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592
* AMDGPU: Remove dead codeMatt Arsenault2016-07-141-9/+0
| | | | llvm-svn: 275369
* AMDGPU: Remove last AMDIL intrinsicsMatt Arsenault2016-07-131-5/+1
| | | | llvm-svn: 275309
* AMDGPU: Expand unaligned accesses earlyMatt Arsenault2016-07-011-20/+47
| | | | | | | | Due to visit order problems, in the case of an unaligned copy the legalized DAG fails to eliminate extra instructions introduced by the expansion of both unaligned parts. llvm-svn: 274397
* AMDGPU: Improve load/store of illegal types.Matt Arsenault2016-07-011-37/+97
| | | | | | | | | | There was a combine before to handle the simple copy case. Split this into handling loads and stores separately. We might want to change how this handles some of the vector extloads, since this can result in large code size increases. llvm-svn: 274394
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-1/+1
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* [AMDGPU] Remove exit-on-error in test (PR27761)Diana Picus2016-06-231-1/+2
| | | | | | | | | | | | | | | | | The exit-on-error flag was necessary in order to avoid an assertion when handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize. We can avoid the assertion by creating some dummy nodes. This enables us to remove the exit-on-error flag on the first 2 run lines (SI), but on the third run line (R600) we would run into another assertion when trying to reserve indirect registers. This patch also replaces that assertion with an early exit from the function. Fixes PR27761. Differential Revision: http://reviews.llvm.org/D20852 llvm-svn: 273550
* AMDGPU: Fix verifier errors in SILowerControlFlowMatt Arsenault2016-06-221-2/+3
| | | | | | | | | | | | | The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467
* AMDGPU: Fix kernel argument alignment impacting stack sizeMatt Arsenault2016-06-181-7/+9
| | | | | | | | Don't use AllocateStack because kernel arguments have nothing to do with the stack. The ensureMaxAlignment call was still changing the stack alignment. llvm-svn: 273080
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705
* Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"Tom Stellard2016-06-141-1/+0
| | | | | | This reverts commit r272675. llvm-svn: 272677
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-141-0/+1
| | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-22/+17
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU: Temporary fix for broken store combineMatt Arsenault2016-06-021-0/+2
| | | | llvm-svn: 271567
* AMDGPU: Fix inconsistent lowering of select of vectorsMatt Arsenault2016-05-251-1/+9
| | | | | | | | | f32 vectors would use a sequence of BFI instructions instead of unrolled cmp + select. This was better in the case of a VALU select with SGPR inputs, but we don't have a way of dealing with that in the DAG. llvm-svn: 270731
* AMDGPU: Cleanup lowering actionsMatt Arsenault2016-05-211-121/+169
| | | | | | | | These are kind of a mess and hard to follow, particularly for loads and stores. Fix various redundant, unnecessary and dead settings. llvm-svn: 270307
* AMDGPU: Fix high bits after division optimizationMatt Arsenault2016-05-211-17/+36
| | | | | | | This is essentially doing a 24-bit signed division with FP. We need to truncate to the N bit result. llvm-svn: 270305
* AMDGPU: Remove pointless conversionsMatt Arsenault2016-05-191-30/+10
| | | | llvm-svn: 270139
* AMDGPU: Fix assert when erroring on a callMatt Arsenault2016-05-181-1/+5
| | | | | | | For some reason an assert is now hit when a valid chain is not returned, so return the entry chain. llvm-svn: 269948
* AMDGPU: Unify LowerGlobalAddressJan Vesely2016-05-131-0/+5
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19794 llvm-svn: 269481
* AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cppTom Stellard2016-05-021-39/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: jvesely, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19736 llvm-svn: 268267
* [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in ↵Craig Topper2016-04-281-8/+2
| | | | | | TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853
* [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.Ahmed Bougacha2016-04-261-20/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606
* AMDGPU: Add DAG to debug dumpMatt Arsenault2016-04-251-2/+2
| | | | | | Also reorder case to match enum order llvm-svn: 267449
* AMDGPU: Re-visit nodes in performAndCombineMatt Arsenault2016-04-221-0/+5
| | | | | | This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240
* AMDGPU: Remove custom load/store scalarizationMatt Arsenault2016-04-141-78/+4
| | | | llvm-svn: 266385
OpenPOWER on IntegriCloud