summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Use SGPR_128 instead of SReg_128 for vregsMatt Arsenault2019-10-108-21/+25
| | | | | | | | | SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
* AMDGPU: Don't fold copies to physregsMatt Arsenault2019-10-091-5/+9
| | | | | | | | | | | In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257
* AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointerMatt Arsenault2019-10-091-4/+14
| | | | | | | | | | This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255
* AMDGPU: Relax register classes usedMatt Arsenault2019-10-091-2/+2
| | | | llvm-svn: 374254
* AMDGPU: Fix typosMatt Arsenault2019-10-091-2/+2
| | | | llvm-svn: 374253
* GlobalISel: Implement fewerElementsVector for G_BUILD_VECTORMatt Arsenault2019-10-091-1/+10
| | | | | | Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252
* [AMDGPU] Fixed dpp combine of VOP1Stanislav Mekhanoshin2019-10-091-0/+8
| | | | | | | | | If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241
* [System Model] [TTI] Define AMDGPUTTIImpl::getST and AMDGPUTTIImpl::getTLIVitaly Buka2019-10-091-2/+10
| | | | | | To fix "infinite recursion" warning. llvm-svn: 374222
* [AMDGPU] Use math constants defined in MathExtras (NFC)Evandro Menezes2019-10-093-45/+19
| | | | | | | | Use the the new math constants in `MathExtras.h`. Differential revision: https://reviews.llvm.org/D68285 llvm-svn: 374208
* AMDGPU: Fix i16 arithmetic pattern redundancyMatt Arsenault2019-10-081-78/+23
| | | | | | | | | | | | | | There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
* AMDGPU: Add offsets to MMO when lowering buffer intrinsicsTom Stellard2019-10-082-11/+73
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087
* [AMDGPU] Disable unused gfx10 dpp instructionsStanislav Mekhanoshin2019-10-082-0/+8
| | | | | | | | | | | Inhibit generation of unused real dpp instructions on gfx10 just like it is done on other subtargets. This does not change anything because these are illegal anyway and not accepted, but it does reduce the number of instruction definitions generated. Differential Revision: https://reviews.llvm.org/D68607 llvm-svn: 374083
* AMDGPU: Propagate undef flag during pre-RA exec mask optimizationsNicolai Haehnle2019-10-081-6/+7
| | | | | | | | | | | | | | Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68184 llvm-svn: 374041
* AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-071-3/+6
| | | | llvm-svn: 373989
* AMDGPU/GlobalISel: Handle more G_INSERT casesMatt Arsenault2019-10-073-57/+55
| | | | | | | | | Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947
* GlobalISel: Partially implement lower for G_INSERTMatt Arsenault2019-10-071-7/+3
| | | | llvm-svn: 373946
* AMDGPU/GlobalISel: Fix selection of 16-bit shiftsMatt Arsenault2019-10-071-3/+6
| | | | llvm-svn: 373945
* AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32Matt Arsenault2019-10-071-1/+1
| | | | llvm-svn: 373944
* AMDGPU/GlobalISel: Use S_MOV_B64 for inline constantsMatt Arsenault2019-10-071-20/+27
| | | | | | | This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943
* AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sourcesMatt Arsenault2019-10-071-18/+29
| | | | | | Continue making a mess of merge/unmerge legality. llvm-svn: 373942
* AMDGPU/GlobalISel: Select more G_INSERT casesMatt Arsenault2019-10-071-20/+78
| | | | | | | | | | At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938
* GlobalISel: Add target pre-isel instructionsMatt Arsenault2019-10-075-2/+15
| | | | | | | | | | | | | | Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
* Second attempt to add iterator_range::empty()Jordan Rose2019-10-071-11/+11
| | | | | | | | | | | | Doing this makes MSVC complain that `empty(someRange)` could refer to either C++17's std::empty or LLVM's llvm::empty, which previously we avoided via SFINAE because std::empty is defined in terms of an empty member rather than begin and end. So, switch callers over to the new method as it is added. https://reviews.llvm.org/D68439 llvm-svn: 373935
* AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsetsMatt Arsenault2019-10-061-2/+5
| | | | llvm-svn: 373842
* AMDGPU/GlobalISel: RegBankSelect mul24 intrinsicsMatt Arsenault2019-10-061-0/+2
| | | | llvm-svn: 373841
* AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsicsMatt Arsenault2019-10-061-0/+35
| | | | llvm-svn: 373840
* AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESSMatt Arsenault2019-10-061-0/+3
| | | | llvm-svn: 373839
* GlobalISel: Partially implement lower for G_EXTRACTMatt Arsenault2019-10-061-1/+13
| | | | | | Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838
* AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsicsMatt Arsenault2019-10-061-6/+5
| | | | | | This wasn't updated for the immarg handling change. llvm-svn: 373837
* [NFC] Add { } to silence compiler warning [-Wmissing-braces].Huihui Zhang2019-10-041-1/+1
| | | | | | | | | ../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} llvm-svn: 373784
* [AMDGPU][MC][GFX10][WS32] Corrected decoding of dst operand for v_cmp_*_sdwa ↵Dmitry Preobrazhensky2019-10-041-1/+2
| | | | | | | | | | | | opcodes See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68349 llvm-svn: 373745
* [AMDGPU][MC][GFX10] Enabled decoding of 'null' operandDmitry Preobrazhensky2019-10-041-0/+1
| | | | | | | | | | See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68348 llvm-svn: 373740
* [AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructionsDmitry Preobrazhensky2019-10-041-1/+1
| | | | | | | | | | See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68347 llvm-svn: 373736
* AMDGPU/GlobalISel: Fix using wrong addrspace for apertureMatt Arsenault2019-10-041-1/+3
| | | | | | | This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716
* AMDGPU/GlobalISel: Select G_PTRTOINTMatt Arsenault2019-10-041-0/+1
| | | | llvm-svn: 373715
* AMDGPU/GlobalISel: Support wave32 waterfall loopsMatt Arsenault2019-10-041-22/+30
| | | | llvm-svn: 373714
* [AMDGPU][SILoadStoreOptimizer] NFC: Refactor codePiotr Sobczak2019-10-041-120/+80
| | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes. Introducing InstSubclass will keep them separate and help extending InstClassEnum with other instruction types (e.g. MIMG) in the future. This patch also makes getSubRegIdxs() more concise. Reviewers: nhaehnle, arsenm, tstellar Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68384 llvm-svn: 373699
* [NFC] Fix unused variable in release buildsJordan Rupprecht2019-10-031-1/+2
| | | | llvm-svn: 373646
* AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELTMatt Arsenault2019-10-031-5/+77
| | | | llvm-svn: 373639
* AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelectMatt Arsenault2019-10-032-167/+271
| | | | | | | | Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638
* AMDGPU/GlobalISel: Allow VGPR to index SGPR registerMatt Arsenault2019-10-031-4/+6
| | | | | | | | We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637
* AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 andMatt Arsenault2019-10-031-2/+3
| | | | | | This would try to do FewerElements to v9s8 llvm-svn: 373635
* AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructionsTom Stellard2019-10-031-82/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds a pre-pass to this optimization that scans through the basic block and generates lists of mergeable instructions with one list per unique address. In the optimization phase instead of scanning through the basic block for mergeable instructions, we now iterate over the lists generated by the pre-pass. The decision to re-optimize a block is now made per list, so if we fail to merge any instructions with the same address, then we do not attempt to optimize them in future passes over the block. This will help to reduce the time this pass spends re-optimizing instructions. In one pathological test case, this change reduces the time spent in the SILoadStoreOptimizer from 0.2s to 0.03s. This restructuring will also make it possible to implement further solutions in this pass, because we can now add less expensive checks to the pre-pass and filter instructions out early which will avoid the need to do the expensive scanning during the optimization pass. For example, checking for adjacent offsets is an inexpensive test we can move to the pre-pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65961 llvm-svn: 373630
* [AArch64] Static (de)allocation of SVE stack objects.Sander de Smalen2019-10-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects. The focus of this patch is purely to allow the stack frame to allocate/deallocate space for scalable SVE objects. More dynamic allocation (at compile-time, i.e. determining placement of SVE objects on the stack), or resolving frame-index references that include scalable-sized offsets, are left for subsequent patches. SVE objects are allocated in the stack frame as a separate region below the callee-save area, and above the alignment gap. This is done so that the SVE objects can be accessed directly from the FP at (runtime) VL-based offsets to benefit from using the VL-scaled addressing modes. The layout looks as follows: +-------------+ | stack arg | +-------------+ | Callee Saves| | X29, X30 | (if available) |-------------| <- FP (if available) | : | | SVE area | | : | +-------------+ |/////////////| alignment gap. | : | | Stack objs | | : | +-------------+ <- SP after call and frame-setup SVE and non-SVE stack objects are distinguished using different StackIDs. The offsets for objects with TargetStackID::SVEVector should be interpreted as purely scalable offsets within their respective SVE region. Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61437 llvm-svn: 373585
* AMDGPU/GlobalISel: Don't re-get subtargetMatt Arsenault2019-10-031-6/+3
| | | | | | It's already available in the class. llvm-svn: 373568
* AMDGPU/GlobalISel: Expand G_BITCAST legalityMatt Arsenault2019-10-031-4/+1
| | | | llvm-svn: 373567
* [AMDGPU] Fix illegal agpr use by VALUStanislav Mekhanoshin2019-10-021-1/+10
| | | | | | | | | | | | | | | | | | | When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-0214-155/+308
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* [AMDGPU] Make printf lowering faster when there are no printfsJay Foad2019-10-021-16/+14
| | | | | | | | | | | | | | | | | Summary: Printf lowering unconditionally visited every instruction in the module. To make it faster in the common case where there are no printfs, look up the printf function (if any) and iterate over its users instead. Reviewers: rampitec, kzhuravl, alex-t, arsenm Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68145 llvm-svn: 373433
* AMDGPU/GlobalISel: Use getIntrinsicID helperMatt Arsenault2019-10-023-7/+7
| | | | llvm-svn: 373417
OpenPOWER on IntegriCloud