summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* GlobalISel: Allow bitcount ops to have different result typeMatt Arsenault2019-01-315-0/+440
| | | | | | For AMDGPU the result is always 32-bit for 64-bit inputs. llvm-svn: 352717
* Add a 'dynamic' parameter to the objectsize intrinsicErik Pilkington2019-01-301-3/+3
| | | | | | | | | | | | | | This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
* GlobalISel: Implement fewerElementsVector for selectMatt Arsenault2019-01-301-0/+209
| | | | llvm-svn: 352601
* AMDGPU/GlobalISel: Fix clamping shifts with 16-bit instsMatt Arsenault2019-01-303-0/+126
| | | | llvm-svn: 352599
* GlobalISel: Support narrowScalar for uneven loadsMatt Arsenault2019-01-301-1/+120
| | | | llvm-svn: 352594
* GlobalISel: Handle some odd splits in fewerElementsVectorMatt Arsenault2019-01-301-3/+75
| | | | | | Also add some quick hacks to AMDGPU legality for the tests. llvm-svn: 352591
* GlobalISel: Handle more cases for widenScalar for G_STOREMatt Arsenault2019-01-301-0/+99
| | | | llvm-svn: 352585
* GlobalISel: Verify pointer castsMatt Arsenault2019-01-291-8/+8
| | | | | | | Not sure if the old AArch64 tests should be just deleted or not. llvm-svn: 352562
* GlobalISel: Partially implement widenScalar for MERGE_VALUESMatt Arsenault2019-01-291-0/+156
| | | | llvm-svn: 352560
* GlobalISel: Fix narrowScalar for load/store with different mem sizeMatt Arsenault2019-01-292-0/+132
| | | | | | | | | | This was ignoring the memory size, and producing multiple loads/stores if the operand size was different from the memory size. I assume this is the intent of not having an explicit G_ANYEXTLOAD (although I think that would probably be better). llvm-svn: 352523
* [AMDGPU] Fix a weird WWM intrinsic issue.Neil Henning2019-01-291-0/+47
| | | | | | | | | | | I found a really strange WWM issue through a very convoluted shader that essentially boils down to a bug in SIInstrInfo where canReadVGPR did not correctly identify that WWM is like a copy and can have a VGPR as its source. Differential Revision: https://reviews.llvm.org/D56002 llvm-svn: 352500
* AMDGPU: Add DS append/consume intrinsicsMatt Arsenault2019-01-282-0/+250
| | | | | | | | | | | | | | | Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. llvm-svn: 352422
* [AMDGPU] Add intrinsics for 16 bit interpolationTim Corringham2019-01-281-0/+187
| | | | | | | | | | | | | | | | | | | Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357
* GlobalISel: Don't reduce elements for atomic load/storeMatt Arsenault2019-01-271-0/+46
| | | | | | | This is invalid for the same reason as in the narrowScalar handling for load. llvm-svn: 352334
* GlobalISel: Verify load/store has a pointer inputMatt Arsenault2019-01-2713-38/+38
| | | | | | | I expected this to be automatically verified, but it seems nothing uses that the type index was declared as a "ptype" llvm-svn: 352319
* GlobalISel: Implement narrowScalar for mulMatt Arsenault2019-01-271-0/+26
| | | | llvm-svn: 352300
* GlobalISel: fewerElementsVector for intrinsic_trunc/intrinsic_roundMatt Arsenault2019-01-272-8/+92
| | | | llvm-svn: 352298
* AMDGPU/GlobalISel: Legalize more bit opsMatt Arsenault2019-01-263-24/+567
| | | | llvm-svn: 352295
* AMDGPU/GlobalISel: Widen small uaddo/usuboMatt Arsenault2019-01-262-0/+194
| | | | llvm-svn: 352294
* [MBP] Don't move bottom block before header if it can't reduce taken branchesGuozhi Wei2019-01-251-9/+9
| | | | | | | | | | | | | | | | | | If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case: -->OldTop<- | . | | . | | . | ---Pred | | | BB----- Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving. Differential Revision: https://reviews.llvm.org/D57067 llvm-svn: 352236
* AMDGPU/GlobalISel: Scalarize add/subMatt Arsenault2019-01-252-2/+64
| | | | llvm-svn: 352167
* GlobalISel: fewerElementsVector for more cast typesMatt Arsenault2019-01-254-0/+151
| | | | llvm-svn: 352166
* GlobalISel: fewerElementsVector for a few more trivial opsMatt Arsenault2019-01-256-0/+332
| | | | llvm-svn: 352165
* AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mulMatt Arsenault2019-01-255-2/+238
| | | | llvm-svn: 352162
* GlobalISel: Support fewerElementsVector for icmp/fcmpMatt Arsenault2019-01-252-13/+285
| | | | | | Also legalize 64-bit compares for AMDGPU llvm-svn: 352157
* GlobalISel: Implement fewerElementsVector for extensionsMatt Arsenault2019-01-255-7/+483
| | | | llvm-svn: 352155
* [GISel]: Change how CSE is enabled by default for each passAditya Nandakumar2019-01-241-1/+1
| | | | | | | | | | | | | | | https://reviews.llvm.org/D57178 Now add a hook in TargetPassConfig to query if CSE needs to be enabled. By default this hook returns false only for O0 opt level but this can be overridden by the target. As a consequence of the default of enabled for non O0, a few tests needed to be updated to not use CSE (by passing in -O0) to the run line. reviewed by: arsenm llvm-svn: 352126
* RegBankSelect: Support some more complex part mappingsMatt Arsenault2019-01-241-0/+386
| | | | llvm-svn: 352123
* [AMDGPU] With XNACK, cannot clause a load with result coalesced with operandTim Renouf2019-01-231-0/+48
| | | | | | | | | | | | | | | | | | Summary: With XNACK, an smem load whose result is coalesced with an operand (thus it overwrites its own operand) cannot appear in a clause, because some other instruction might XNACK and restart the whole clause. The clause breaker already realized that an smem that overwrites an operand cannot appear in a clause, and broke the clause. The problem that this commit fixes is that the SIFormMemoryClauses optimization formed a bundle with early clobber, which caused the earlier code that set up the coalesced operand to be removed as dead. Differential Revision: https://reviews.llvm.org/D57008 Change-Id: I703c4d5b0bf7d6060222bec491f45c18bb3c0016 llvm-svn: 351950
* AMDGPU/GlobalISel: Start selectively legalizing 16-bit operationsMatt Arsenault2019-01-223-51/+603
| | | | | | | | It might be a bit nicer to use the fancy .legalIf and co. predicates, but this was requiring more boilerplate and disables the coverage assertions. llvm-svn: 351886
* AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shiftsMatt Arsenault2019-01-227-16/+374
| | | | llvm-svn: 351884
* GlobalISel: Implement widen for extract_vector_elt elt typeMatt Arsenault2019-01-221-11/+92
| | | | llvm-svn: 351871
* GlobalISel: Implement fewerElementsVector for basic FP opsMatt Arsenault2019-01-226-0/+2146
| | | | llvm-svn: 351866
* GlobalISel: Support narrowing zextload/sextloadMatt Arsenault2019-01-228-0/+747
| | | | llvm-svn: 351856
* GlobalISel: Disallow vectors for G_CONSTANT/G_FCONSTANTMatt Arsenault2019-01-221-25/+25
| | | | llvm-svn: 351853
* Codegen support for atomicrmw fadd/fsubMatt Arsenault2019-01-221-0/+109
| | | | llvm-svn: 351851
* AMDGPU/GlobalISel: Legalize more fp<->int conversionsMatt Arsenault2019-01-224-7/+90
| | | | llvm-svn: 351767
* [AMDGPU] Fixed hazard recognizer to walk predecessorsStanislav Mekhanoshin2019-01-211-0/+230
| | | | | | | | | | | | | | | | | | | | Fixes two problems with GCNHazardRecognizer: 1. It only scans up to 5 instructions emitted earlier. 2. It does not take control flow into account. An earlier instruction from the previous basic block is not necessarily a predecessor. At the same time a real predecessor block is not scanned. The patch provides a way to distinguish between scheduler and hazard recognizer mode. It is OK to work with emitted instructions in the scheduler because we do not really know what will be emitted later and its order. However, when pass works as a hazard recognizer the schedule is already finalized, and we have full access to the instructions for the whole function, so we can properly traverse predecessors and their instructions. Differential Revision: https://reviews.llvm.org/D56923 llvm-svn: 351759
* AMDGPU: Legalize more bitcastsMatt Arsenault2019-01-201-5/+169
| | | | llvm-svn: 351700
* AMDGPU/GlobalISel: Really legalize exts from i1Matt Arsenault2019-01-202-20/+42
| | | | | | | | There is a combine that was hiding these tests not actually testing what they should be, although they were producing the expected end result. llvm-svn: 351698
* GlobalISel: Implement widenScalar for basic FP opsMatt Arsenault2019-01-206-57/+490
| | | | llvm-svn: 351696
* AMDGPU/GlobalISel: Legalize f32->f16 fptruncMatt Arsenault2019-01-201-2/+19
| | | | llvm-svn: 351695
* AMDGPU/GlobalISel: Fix some crashs in g_unmerge_values/g_merge_valuesMatt Arsenault2019-01-202-4/+77
| | | | | | | | | | | This was crashing in the predicate function assuming the value is a vector. Copy more of what AArch64 uses. This probably needs more refinement later, but I don't exactly understand what it means in some cases, particularly since any legalization for these seems to be missing. llvm-svn: 351693
* AMDGPU/GlobalISel: Regbank select for fpextMatt Arsenault2019-01-201-0/+31
| | | | llvm-svn: 351692
* AMDGPU/GlobalISel: Cleanup legality for extensionsMatt Arsenault2019-01-204-2/+230
| | | | llvm-svn: 351691
* AMDGPU/GlobalISel: Legalize more types for selectMatt Arsenault2019-01-182-18/+174
| | | | llvm-svn: 351599
* AMDGPU/GlobalISel: Legalize illegal g_constantMatt Arsenault2019-01-182-22/+96
| | | | llvm-svn: 351596
* [AMDGPU][MC][GFX8+][DISASSEMBLER] Corrected 1/2pi value for 64-bit operandsDmitry Preobrazhensky2019-01-181-1/+1
| | | | | | | | | | See bug 39332: https://bugs.llvm.org/show_bug.cgi?id=39332 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D56794 llvm-svn: 351555
* AMDGPU: Convert tests away from llvm.SI.load.constMatt Arsenault2019-01-178-282/+282
| | | | llvm-svn: 351494
* Allow FP types for atomicrmw xchgMatt Arsenault2019-01-176-0/+68
| | | | llvm-svn: 351427
OpenPOWER on IntegriCloud