summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Ensure trig range reduction only used for subtargets that require itDavid Stuttard2018-09-144-9/+28
| | | | | | | | | | | | | | | | | | Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222
* [AMDGPU] Removed unused methodTim Renouf2018-09-131-22/+0
| | | | | | | | | | | | | Summary: I accidentally left this behind in D50306, and it causes a build warning when I build with gcc7. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52022 Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c llvm-svn: 342189
* AMDGPU: Fix not preserving alignent in call setupsMatt Arsenault2018-09-131-1/+7
| | | | | | | | | | | | If an argument was passed on the stack, this was using the default alignment. I'm not sure there's an observable change from this. This was observable due to bugs in expansion of unaligned loads and stores, but since that is fixed I don't think this matters much. llvm-svn: 342133
* [AMDGPU] Load divergence predicate refactoringAlexander Timofeev2018-09-132-8/+26
| | | | | | | | Differential revision: https://reviews.llvm.org/D51931 Reviewers: rampitec llvm-svn: 342120
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-131-0/+1
| | | | | | | | | | Load offset inlining pattern changed. Differential revision: https://reviews.llvm.org/D51975 Reviewers: rampitec llvm-svn: 342115
* AMDGPU: Print all kernel descriptor directives (including the ones with ↵Konstantin Zhuravlyov2018-09-121-101/+88
| | | | | | | | | | default values) Change by Tony Tye Differential Revision: https://reviews.llvm.org/D51954 llvm-svn: 342077
* AMDGPU: Re-apply r341982 after fixing the layering issueKonstantin Zhuravlyov2018-09-1211-391/+364
| | | | | | | | | | | | Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069
* Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into ↵Ilya Biryukov2018-09-1211-274/+391
| | | | | | | | | | | TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023
* AMDGPU: Move isa version and EF_AMDGPU_MACH_* determinationKonstantin Zhuravlyov2018-09-1111-391/+274
| | | | | | | | | | | | | | into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-114-21/+62
| | | | | | | | | Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
* AMDGPU: Remove leftovers from configurable address spacesMatt Arsenault2018-09-112-34/+12
| | | | llvm-svn: 341895
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-101-5/+33
| | | | | | | | | | Inline immediate move to V_MADAK_F32. Differential revision: https://reviews.llvm.org/D51586 Reviewer: rampitec llvm-svn: 341843
* AMDGPU: Remove function pointer type hackMatt Arsenault2018-09-101-7/+4
| | | | | | | Now the pointer size should always be correct and we don't need to improperly inspect the pointee type. llvm-svn: 341806
* AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bitMatt Arsenault2018-09-101-2/+1
| | | | | | | This will require something to cast. Before this would eliminate the cast, which would result in copies of $noreg. llvm-svn: 341803
* DAG: Handle odd vector sizes in calling conv splittingMatt Arsenault2018-09-101-8/+5
| | | | | | | | | | | | | | This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801
* [AMDGPU] Prevent sequences of non-instructions disrupting ↵Carl Ritson2018-09-101-2/+9
| | | | | | | | | | | | | | | | | | GCNHazardRecognizer wait state counting Summary: This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted. Reviewers: nhaehnle, arsenm Reviewed By: arsenm Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51726 Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b llvm-svn: 341798
* AMDGPU: Use GOT PSV since it has an address space nowMatt Arsenault2018-09-101-2/+2
| | | | llvm-svn: 341768
* AMDGPU: Don't abort on unknown addrspace argumentMatt Arsenault2018-09-101-8/+10
| | | | llvm-svn: 341767
* [AMDGPU] Preliminary patch for divergence driven instruction selection. Fold ↵Alexander Timofeev2018-09-071-3/+11
| | | | | | | | | immediate SMRD offset. Differential revision: https://reviews.llvm.org/D51610 Reviewer: rampitec llvm-svn: 341636
* Revert r341413Scott Linder2018-09-063-232/+67
| | | | | | Causes a regression in expensive checks. llvm-svn: 341589
* AMDGPU: Remove old hack for function addressesMatt Arsenault2018-09-061-13/+0
| | | | llvm-svn: 341567
* [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructionsScott Linder2018-09-043-67/+232
| | | | | | | | | Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Differential Revision: https://reviews.llvm.org/D50982 llvm-svn: 341413
* AMDGPU: Fix DAG divergence not reporting flat loadsMatt Arsenault2018-09-041-4/+4
| | | | | | Match behavior in DAG of r340343 llvm-svn: 341393
* Remove unnecessary semicolon to silence -Wpedantic warning. NFCI.Simon Pilgrim2018-09-031-1/+1
| | | | llvm-svn: 341303
* AMDGPU/GlobalISel: Define instruction mapping for G_SELECTTom Stellard2018-09-011-0/+54
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49737 llvm-svn: 341271
* [AMDGPU] Split v32i32 loadsStanislav Mekhanoshin2018-08-311-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D51555 llvm-svn: 341266
* AMDGPU: Restrict extract_vector_elt combine to loadsMatt Arsenault2018-08-311-1/+2
| | | | | | | | | | | The intention is to enable the extract_vector_elt load combine, and doing this for other operations interferes with more useful optimizations on vectors. Handle any type of load since in principle we should do the same combine for the various load intrinsics. llvm-svn: 341219
* AMDGPU: Stop forcing internalize at -O0Matt Arsenault2018-08-311-11/+0
| | | | | | | This doesn't really matter if clang is always emitting the visibility as hidden by default. llvm-svn: 341168
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-3141-374/+244
| | | | llvm-svn: 341165
* [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysisNicolai Haehnle2018-08-308-28/+27
| | | | | | | | | | | | | | | | | | | | Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-08-301-2/+25
| | | | | | | | | | Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068
* AMDGPU: Handle 32-bit address wraparounds for SMRD opcodesMarek Olsak2018-08-291-1/+5
| | | | | | | | | | | | Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51203 llvm-svn: 340959
* [AMDGPU] Match udot4 pattern.Farhana Aleen2018-08-291-0/+39
| | | | | | | | | | | | | | | | | Summary: D.u32 = S0.u8[0] * S1.u8[0] + S0.u8[1] * S1.u8[1] + S0.u8[2] * S1.u8[2] + S0.u8[3] * S1.u8[3] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50921 llvm-svn: 340936
* AMDGPU: Fix getInstSizeInBytesNicolai Haehnle2018-08-294-34/+45
| | | | | | | | | | | | | | | | | | | | | | | Summary: Add some optional code to validate getInstSizeInBytes for emitted instructions. This flushed out some issues which are fixed by this patch: - Streamline getInstSizeInBytes - Properly define the VI readlane/writelane instruction as VOP3 - Fix the inline constant determination. Specifically, this change fixes an issue where a 32-bit value of 0xffffffff was recorded as unsigned. This is equal to -1 when restricting to a 32-bit comparison, and an inline constant can be used. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50629 Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2 llvm-svn: 340903
* [AMDGPU] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=offFangrui Song2018-08-281-2/+1
| | | | llvm-svn: 340868
* AMDGPU: Don't delete instructions if S_ENDPGM has implicit usesMatt Arsenault2018-08-281-1/+8
| | | | | | | | This can leave behind the uses with the defs removed. Since this should only really happen in tests, it's not worth the effort of trying to handle this. llvm-svn: 340866
* AMDGPU: Force shrinking of add/sub even if the carry is usedMatt Arsenault2018-08-281-5/+8
| | | | | | | | | The original motivating example uses a 64-bit add, so the carry is used. Insert a copy from VCC. This may allow shrinking of the used carry instruction. At worst, we are replacing a mov to materialize the constant with a copy of vcc. llvm-svn: 340862
* AMDGPU: Shrink insts to fold immediatesMatt Arsenault2018-08-284-54/+138
| | | | | | | | | This needs to be done in the SSA fold operands pass to be effective, so there is a bit of overlap with SIShrinkInstructions but I don't think this is practically avoidable. llvm-svn: 340859
* AMDGPU: Move canShrink into TIIMatt Arsenault2018-08-283-56/+57
| | | | llvm-svn: 340855
* [AMDGPU] Add support for a16 modifiear for gfx9Ryan Taylor2018-08-289-45/+86
| | | | | | | | | | | | | Summary: Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9. Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50575 llvm-svn: 340831
* [AMDGPU] Add support for multi-dword s.buffer.load intrinsicTim Renouf2018-08-257-25/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684
* AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr spaceSamuel Pitoiset2018-08-223-32/+36
| | | | | | | | | | | | | | | | 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 llvm-svn: 340417
* AMDGPU: fix existing alias rules for constant and globalSamuel Pitoiset2018-08-221-5/+5
| | | | | | | | | | Constant and global may alias, also one rules table wasn't ordered correctly. Pinpointed by Matt. v2: add a test with swapped parameters llvm-svn: 340416
* AMDGPU: Fix not respecting byval alignment in call frame setupMatt Arsenault2018-08-224-20/+15
| | | | | | | | | This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396
* Update MemorySSA in BasicBlockUtils.Alina Sbirlea2018-08-211-1/+2
| | | | | | | | | | | Summary: Extend BasicBlocksUtils to update MemorySSA. Subscribers: sanjoy, arsenm, nhaehnle, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D45300 llvm-svn: 340365
* [AMDGPU] Consider loads from flat addrspace to be potentially divergentScott Linder2018-08-211-4/+6
| | | | | | | | | In general we can't assume flat loads are uniform, and cases where we can prove they are should be handled through infer-address-spaces. Differential Revision: https://reviews.llvm.org/D50991 llvm-svn: 340343
* [AMDGPU] Support idot2 pattern.Farhana Aleen2018-08-212-0/+23
| | | | | | | | | | | | | | | | Summary: Transform add (mul ((i32)S0.x, (i32)S1.x), add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3) Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50024 llvm-svn: 340295
* [AMDGPU] Allow int types for MUBUF vdataTim Renouf2018-08-211-0/+20
| | | | | | | | | | | | | | | | | Summary: Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics only allowed float types for the data to be loaded or stored, which sometimes meant the frontend needed to generate a bitcast. In this, the new intrinsics copied the old buffer intrinsics. This commit extends the new intrinsics to allow int types as well. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50315 Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85 llvm-svn: 340270
* [AMDGPU] New buffer intrinsicsTim Renouf2018-08-217-179/+502
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269
* [AMDGPU] New tbuffer intrinsicsTim Renouf2018-08-217-104/+309
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit adds new intrinsics llvm.amdgcn.raw.tbuffer.load llvm.amdgcn.struct.tbuffer.load llvm.amdgcn.raw.tbuffer.store llvm.amdgcn.struct.tbuffer.store with the following changes from the llvm.amdgcn.tbuffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined format arg (dfmt+nfmt) * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::TBUFFER_* SD nodes always have an index operand, all three offset operands, combined format operand, combined cachepolicy operand, and an extra idxen operand. The tbuffer pseudo- and real instructions now also have a combined format operand. The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store intrinsics continue to work. V2: Separate raw and struct intrinsics. V3: Moved extract_glc and extract_slc defs to a more sensible place. V4: Rebased on D49995. V5: Only two separate offset args instead of three. V6: Pseudo- and real instructions have joint format operand. V7: Restored optionality of dfmt and nfmt in assembler. V8: Addressed minor review comments. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49026 Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4 llvm-svn: 340268
OpenPOWER on IntegriCloud