summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"Tim Renouf2018-03-281-4/+2
| | | | | | | | | | | This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981. It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a release-with-asserts build. I will resubmit the change when I have fixed that. Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a llvm-svn: 328695
* AMDGPU: Really implement getFrameRegisterMatt Arsenault2018-03-271-1/+2
| | | | | | | Currently this seems to only really be used for debug info. llvm-svn: 328677
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-03-271-2/+4
| | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673
* AMDGPU: Fix not preserving CSR VGPR if used for SGPR spillsMatt Arsenault2018-03-271-4/+3
| | | | | | | | Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659
* AMDGPU: Set natural stack alignment in DataLayoutMatt Arsenault2018-03-271-2/+2
| | | | | | | Only 4 byte alignment is ever useful, so increasing anything beyond this may require realigning the stack. llvm-svn: 328656
* AMDGPU: Fix crash when MachinePointerInfo invalidMatt Arsenault2018-03-271-1/+1
| | | | | | | | The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652
* AMDGPU: Fix FP restore from being reordered with stack opsMatt Arsenault2018-03-271-1/+6
| | | | | | | | | | | | | | | | | In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650
* [AMDGPU] Improve disassembler error handlingTim Corringham2018-03-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: llvm-objdump now disassembles unrecognised opcodes as data, using the .long directive. We treat unrecognised opcodes as being 32 bit values, so move along 4 bytes rather than the single byte which previously resulted in a cascade of bogus disassembly following an unrecognised opcode. While no solution can always disassemble code that contains embedded data correctly this provides a significant improvement. The disassembler will now cope with an arbitrary length section as it no longer truncates it to a multiple of 4 bytes, and will use the .byte directive for trailing bytes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44685 llvm-svn: 328553
* AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classesNicolai Haehnle2018-03-266-67/+56
| | | | | | | Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496
* [AMDGPU] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-03-241-1/+1
| | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Reviewers: tstellar, RKSimon, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44856 llvm-svn: 328429
* Fix layering by moving ValueTypes.h from CodeGen to IRDavid Blaikie2018-03-234-5/+5
| | | | | | ValueTypes.h is implemented in IR already. llvm-svn: 328397
* Fix layering of MachineValueType.h by moving it from CodeGen to SupportDavid Blaikie2018-03-237-7/+7
| | | | | | | | | This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395
* Move TargetLoweringObjectFile from CodeGen to Target to fix layeringDavid Blaikie2018-03-232-2/+2
| | | | | | | It's implemented in Target & include from other Target headers, so the header should be in Target. llvm-svn: 328392
* [AMDGPU] Remove use of OpenCL triple environment and replace with function ↵Tony Tye2018-03-231-8/+4
| | | | | | | | | | | attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349
* Fix a couple of layering violations in TransformsDavid Blaikie2018-03-212-2/+2
| | | | | | | | | | | | | Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering. Transforms depends on Transforms/Utils, not the other way around. So remove the header and the "createStripGCRelocatesPass" function declaration (& definition) that is unused and motivated this dependency. Move Transforms/Utils/Local.h into Analysis because it's used by Analysis/MemoryBuiltins.cpp. llvm-svn: 328165
* [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"Nirav Dave2018-03-191-3/+2
| | | | | | | Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898
* TableGen: Check the dynamic type of !cast<Rec>(string)Nicolai Haehnle2018-03-191-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: The docs already claim that this happens, but so far it hasn't. As a consequence, existing TableGen files get this wrong a lot, but luckily the fixes are all reasonably straightforward. To make this work with all the existing forms of self-references (since the true type of a record is only built up over time), the lookup of self-references in !cast is delayed until the final resolving step. Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D44475 llvm-svn: 327849
* AMDGPU/GlobalISel: RegBankSelect for basic int opsMatt Arsenault2018-03-192-0/+4
| | | | llvm-svn: 327843
* AMDGPU: Don't leave dead illegal VGPR->SGPR copiesMatt Arsenault2018-03-191-0/+7
| | | | | | | | | Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842
* Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""Nirav Dave2018-03-171-2/+3
| | | | | | as it times out building test-suite on PPC. llvm-svn: 327778
* [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"Nirav Dave2018-03-171-3/+2
| | | | | | | Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777
* AMDGPU/GlobalISel: Cleanup constant legalityMatt Arsenault2018-03-171-8/+5
| | | | llvm-svn: 327774
* AMDGPU/GlobalISel: Basic G_GEP legalityMatt Arsenault2018-03-171-4/+18
| | | | llvm-svn: 327773
* AMDGPU/GlobalISel: Basic legality for load/storeMatt Arsenault2018-03-171-14/+39
| | | | llvm-svn: 327772
* [AMDGPU] Supported ds_write_b128 generation.Farhana Aleen2018-03-164-6/+16
| | | | | | | | | | | | | | Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726
* [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP ↵Dmitry Preobrazhensky2018-03-164-6/+23
| | | | | | | | | | | opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723
* [AMDGPU][MC] Corrected default values for unused SDWA operandsDmitry Preobrazhensky2018-03-162-10/+10
| | | | | | | | | See bug 36355: https://bugs.llvm.org/show_bug.cgi?id=36355 Differential Revision: https://reviews.llvm.org/D44481 Reviewers: artem.tamazov, arsenm llvm-svn: 327720
* [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case ↵Mark Searles2018-03-141-13/+29
| | | | | | | | of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed. Differential Revision: https://reviews.llvm.org/D44434 llvm-svn: 327583
* [AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD InstructionDmitry Preobrazhensky2018-03-122-4/+33
| | | | | | | | | See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558 Differential Revision: https://reviews.llvm.org/D43950 Reviewers: artem.tamazov, arsenm llvm-svn: 327299
* [AMDGPU] Fix lowering enqueue kernel when kernel has no nameYaxun Liu2018-03-121-8/+16
| | | | | | | | | | Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kernel or __amdgpu_enqueued_kernel.n where n is a sequential number starting from 1. Differential Revision: https://reviews.llvm.org/D44322 llvm-svn: 327291
* [AMDGPU][MC] Corrected GATHER4 opcodesDmitry Preobrazhensky2018-03-123-82/+119
| | | | | | | | | See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252 Differential Revision: https://reviews.llvm.org/D43874 Reviewers: artem.tamazov, arsenm llvm-svn: 327278
* AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELTMatt Arsenault2018-03-122-0/+70
| | | | llvm-svn: 327269
* AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUESMatt Arsenault2018-03-121-0/+12
| | | | llvm-svn: 327268
* AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legalMatt Arsenault2018-03-121-0/+27
| | | | llvm-svn: 327267
* Test commit - change comment slightly.Michael Bedy2018-03-111-2/+2
| | | | llvm-svn: 327234
* AMDGPU: Fix crash when constant folding with physreg operandMatt Arsenault2018-03-101-1/+2
| | | | llvm-svn: 327209
* Revert: r327172 "Correct load-op-store cycle detection analysis"Nirav Dave2018-03-101-2/+3
| | | | | | | | | | r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197
* [DAG] Enforce stricter NodeId invariant during Instruction selectionNirav Dave2018-03-091-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170
* [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local ↵Farhana Aleen2018-03-096-10/+33
| | | | | | | | | | | | | | | | | | | | address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153
* [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9Stanislav Mekhanoshin2018-03-091-13/+12
| | | | | | | | GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D44279 llvm-svn: 327106
* AMDGPU/GlobalISel: Pass subtarget + TM to LegalizerInfoMatt Arsenault2018-03-084-5/+9
| | | | | | These are the parameters x86 already uses. llvm-svn: 327020
* [AMDGPU] Increased vector length for global/constant loads.Farhana Aleen2018-03-073-6/+44
| | | | | | | | | | | | | | | Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910
* Revert "[AMDGPU] Widened vector length for global/constant address space."Farhana Aleen2018-03-073-44/+6
| | | | | | This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907
* [AMDGPU] Widened vector length for global/constant address space.Farhana Aleen2018-03-073-6/+44
| | | | llvm-svn: 326904
* [TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to ↵Craig Topper2018-03-061-1/+1
| | | | | | | | | | | | DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have. There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run. A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does. llvm-svn: 326832
* [AMDGPU] Add default ISA version targetsStanislav Mekhanoshin2018-03-061-0/+6
| | | | | | | | | | | In case if -mattr used to modify feature set bits in llvm-mc call getIsaVersion can fail to identify specific ISA due to test mismatch. Adding default fallback tests which will always correctly report at least major version. Differential Revision: https://reviews.llvm.org/D44163 llvm-svn: 326825
* [AMDGPU] Fix lowering OpenCL enqueue_kernelYaxun Liu2018-03-061-27/+25
| | | | | | | | | | One addrspacecast disappeared in clang emitted IR for block invoke function due to adoption of the new addr space mapping. Differential Revision: https://reviews.llvm.org/D43785 llvm-svn: 326806
* AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACTMatt Arsenault2018-03-051-0/+9
| | | | llvm-svn: 326715
* AMDGPU/GlobalISel: Make some G_EXTRACTs legalMatt Arsenault2018-03-051-0/+12
| | | | | | | As far as I can tell legalization of weird sizes for the output type isn't implemented. llvm-svn: 326714
* AMDGPU: Fix build warning about overrideMatt Arsenault2018-03-051-3/+3
| | | | llvm-svn: 326713
OpenPOWER on IntegriCloud