summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Fixed some instructions latenciesStanislav Mekhanoshin2018-03-304-11/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D45073 llvm-svn: 328874
* [AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.Michael Bedy2018-03-301-0/+88
| | | | | | | | | | | | | | | | | | | | | | Summary: The phase attempts to transform operations that extract a portion of a value into an SDWA src operand in cases where that value is used only once. It was not prepared for this use to be the preserved portion of a value for dst:UNUSED_PRESERVE, resulting in a crash or assert. This change either rejects the illegal SDWA attempt, or in the case where dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded extract instruction. Reviewers: arsenm, #amdgpu Reviewed By: arsenm, #amdgpu Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44364 llvm-svn: 328856
* AMDGPU: Support realigning stackMatt Arsenault2018-03-291-0/+125
| | | | | | | | | | | | | | | | | | While the stack access instructions don't care about alignment > 4, some transformations on the pointer calculation do make assumptions based on knowing the low bits of a pointer are 0. If a stack object ends up being accessed through its absolute address (relative to the kernel scratch wave offset), the addressing expression may depend on the stack frame being properly aligned. This was breaking in a testcase due to the add->or combine. I think some of the SP/FP handling logic is still backwards, and overly simplistic to support all of the stack features. Code which tries to modify the SP with inline asm for example or variable sized objects will probably require redoing this. llvm-svn: 328831
* AMDGPU: Increase default stack alignmentMatt Arsenault2018-03-298-18/+18
| | | | | | | 8 and 16-byte values are common, so increase the default alignment to avoid realigning the stack in most functions. llvm-svn: 328821
* AMDGPU: Fix selection error on constant loads with < 4 byte alignmentMatt Arsenault2018-03-292-0/+24
| | | | llvm-svn: 328818
* Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"Tim Renouf2018-03-281-29/+0
| | | | | | | | | | | This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981. It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a release-with-asserts build. I will resubmit the change when I have fixed that. Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a llvm-svn: 328695
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-03-271-0/+29
| | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673
* [CodeGen] Fixed unreachable with -print-machineinstrs and custom pseudo ↵Tim Renouf2018-03-271-0/+17
| | | | | | | | | | | | | | | | | | source value Summary: Rev 327580 "[CodeGen] Use MIR syntax for MachineMemOperand printing" broke -print-machineinstrs for us on AMDGPU, because we have custom pseudo source values, and MIR serialization does not implement that. This commit at least restores the functionality of -print-machineinstrs, even if it does not properly implement the missing MIR serialization functionality. Differential Revision: https://reviews.llvm.org/D44871 Change-Id: I44961c0b90bf6d48c01484ed7a4e466fd300db66 llvm-svn: 328668
* AMDGPU: Fix not preserving CSR VGPR if used for SGPR spillsMatt Arsenault2018-03-271-0/+31
| | | | | | | | Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659
* AMDGPU: Fix crash when MachinePointerInfo invalidMatt Arsenault2018-03-271-0/+82
| | | | | | | | The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652
* AMDGPU: Fix register name format in testsMatt Arsenault2018-03-272-4/+4
| | | | | | | These were changed to match the asm output name a long time ago, although I think the old tablegenerated names still work. llvm-svn: 328651
* AMDGPU: Fix FP restore from being reordered with stack opsMatt Arsenault2018-03-271-1/+1
| | | | | | | | | | | | | | | | | In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650
* [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPUTony Tye2018-03-232-7/+7
| | | | | | | | Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue. Differential Revision: https://reviews.llvm.org/D44697 llvm-svn: 328351
* [AMDGPU] Remove use of OpenCL triple environment and replace with function ↵Tony Tye2018-03-232-20/+137
| | | | | | | | | | | attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349
* [InstSimplify] fp_binop X, NaN --> NaNSanjay Patel2018-03-211-12/+9
| | | | | | | | We propagate the existing NaN value when possible. Differential Revision: https://reviews.llvm.org/D44521 llvm-svn: 328140
* [AMDGPU] change test to avoid NaN mathSanjay Patel2018-03-191-1/+1
| | | | llvm-svn: 327891
* [AMDGPU] adjust tests to be nan-freeSanjay Patel2018-03-193-51/+60
| | | | | | | | As suggested in D44521 - bitcast to integer for the math, so we preserve the intent of these tests when NaN math gets folded away. llvm-svn: 327890
* AMDGPU/GlobalISel: RegBankSelect for basic int opsMatt Arsenault2018-03-193-0/+201
| | | | llvm-svn: 327843
* AMDGPU: Don't leave dead illegal VGPR->SGPR copiesMatt Arsenault2018-03-192-4/+39
| | | | | | | | | Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842
* AMDGPU/GlobalISel: Cleanup constant legalityMatt Arsenault2018-03-171-56/+22
| | | | llvm-svn: 327774
* AMDGPU/GlobalISel: Basic G_GEP legalityMatt Arsenault2018-03-171-0/+92
| | | | llvm-svn: 327773
* AMDGPU/GlobalISel: Basic legality for load/storeMatt Arsenault2018-03-172-0/+253
| | | | llvm-svn: 327772
* [AMDGPU] Supported ds_write_b128 generation.Farhana Aleen2018-03-166-17/+43
| | | | | | | | | | | | | | Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726
* [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP ↵Dmitry Preobrazhensky2018-03-161-43/+43
| | | | | | | | | | | opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723
* [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case ↵Mark Searles2018-03-141-0/+26
| | | | | | | | of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed. Differential Revision: https://reviews.llvm.org/D44434 llvm-svn: 327583
* [CodeGen] Use MIR syntax for MachineMemOperand printingFrancis Visoiu Mistrih2018-03-141-1/+1
| | | | | | | | | | Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)". rdar://38163529 Differential Revision: https://reviews.llvm.org/D42377 llvm-svn: 327580
* [AMDGPU] Fix lowering enqueue kernel when kernel has no nameYaxun Liu2018-03-121-9/+47
| | | | | | | | | | Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kernel or __amdgpu_enqueued_kernel.n where n is a sequential number starting from 1. Differential Revision: https://reviews.llvm.org/D44322 llvm-svn: 327291
* [AMDGPU][MC] Corrected GATHER4 opcodesDmitry Preobrazhensky2018-03-122-56/+0
| | | | | | | | | See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252 Differential Revision: https://reviews.llvm.org/D43874 Reviewers: artem.tamazov, arsenm llvm-svn: 327278
* AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELTMatt Arsenault2018-03-124-0/+392
| | | | llvm-svn: 327269
* AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUESMatt Arsenault2018-03-122-1/+45
| | | | llvm-svn: 327268
* AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legalMatt Arsenault2018-03-122-0/+147
| | | | llvm-svn: 327267
* [AMDGPU] fix tests to be independent of FP undefSanjay Patel2018-03-103-43/+42
| | | | llvm-svn: 327211
* AMDGPU: Fix crash when constant folding with physreg operandMatt Arsenault2018-03-101-0/+27
| | | | llvm-svn: 327209
* [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local ↵Farhana Aleen2018-03-096-0/+105
| | | | | | | | | | | | | | | | | | | | address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153
* [AMDGPU] fix test to be independent of FP undefSanjay Patel2018-03-091-2/+2
| | | | llvm-svn: 327147
* [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9Stanislav Mekhanoshin2018-03-091-58/+59
| | | | | | | | GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D44279 llvm-svn: 327106
* [AMDGPU] fix test to survive more FP undef constant foldingSanjay Patel2018-03-081-5/+6
| | | | llvm-svn: 327066
* [AMDGPU] fix test to survive the most basic undef constant foldingSanjay Patel2018-03-081-1/+1
| | | | | | | This will likely need to be changed again for anything more than: fmul undef, undef -> undef llvm-svn: 327034
* [AMDGPU] Increased vector length for global/constant loads.Farhana Aleen2018-03-073-1/+71
| | | | | | | | | | | | | | | Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910
* Revert "[AMDGPU] Widened vector length for global/constant address space."Farhana Aleen2018-03-073-71/+1
| | | | | | This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907
* [AMDGPU] Widened vector length for global/constant address space.Farhana Aleen2018-03-073-1/+71
| | | | llvm-svn: 326904
* [AMDGPU] Fix lowering OpenCL enqueue_kernelYaxun Liu2018-03-061-49/+44
| | | | | | | | | | One addrspacecast disappeared in clang emitted IR for block invoke function due to adoption of the new addr space mapping. Differential Revision: https://reviews.llvm.org/D43785 llvm-svn: 326806
* AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACTMatt Arsenault2018-03-051-0/+31
| | | | llvm-svn: 326715
* AMDGPU/GlobalISel: Make some G_EXTRACTs legalMatt Arsenault2018-03-051-0/+105
| | | | | | | As far as I can tell legalization of weird sizes for the output type isn't implemented. llvm-svn: 326714
* Pass Divergence Analysis data to Selection DAG to drive divergenceAlexander Timofeev2018-03-052-16/+59
| | | | | | | | dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703
* AMDGPU/GlobalISel: InstrMapping for G_ZEXTMatt Arsenault2018-03-021-0/+31
| | | | llvm-svn: 326589
* AMDGPU/GlobalISel: InstrMapping for G_TRUNCMatt Arsenault2018-03-021-0/+31
| | | | llvm-svn: 326588
* AMDGPU/GlobalISel: Define InstrMappings for G_FCMPMatt Arsenault2018-03-021-0/+69
| | | | | | Patch by Tom Stellard llvm-svn: 326587
* AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnumMatt Arsenault2018-03-021-0/+66
| | | | | | Patch by Tom Stellard llvm-svn: 326586
* AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnumMatt Arsenault2018-03-021-0/+66
| | | | | | Patch by Tom Stellard llvm-svn: 326567
OpenPOWER on IntegriCloud