summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/ds_read2st64.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-4/+4
| | | | llvm-svn: 319491
* AMDGPU: Select DS insts without m0 initializationMatt Arsenault2017-11-291-69/+108
| | | | | | | | | GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270
* AMDGPU: Allow SIShrinkInstructions to work in non-SSAMatt Arsenault2017-07-101-3/+3
| | | | | | | | Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-13/+13
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [AMDGPU] Remove getBidirectionalReasonRankStanislav Mekhanoshin2017-03-111-2/+2
| | | | | | | | | | | | | | This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
* AMDGPU/SI: Canonicalize offset order for merged DS instructionsTom Stellard2016-08-261-4/+4
| | | | | | | | | | | | | | | | | | | Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870
* AMDGPU: Remove superfluous string attributes from testsMatt Arsenault2016-07-111-1/+1
| | | | | | Also fix v_mac.ll not testing right thing for fneg llvm-svn: 275129
* MachineScheduler: Fully compare top/bottom candidatesMatthias Braun2016-06-251-1/+1
| | | | | | | | | | In bidirectional scheduling this gives more stable results than just comparing the "reason" fields of the top/bottom node because the reason field may be higher depending on what other nodes are in the queue. Differential Revision: http://reviews.llvm.org/D19401 llvm-svn: 273755
* AMDGPU/SI: Enable lanemask tracking in mischedTom Stellard2016-03-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
* Update test case to appease bots after 263255.Chad Rosier2016-03-111-4/+4
| | | | | | I'll follow up with Matt to confirm this is the correct fix. llvm-svn: 263268
* AMDGPU: Remove some old intrinsic uses from testsMatt Arsenault2016-02-111-21/+15
| | | | llvm-svn: 260493
* AMDGPU: Switch barrier intrinsics to using convergentMatt Arsenault2015-12-191-4/+0
| | | | | | | | noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
* AMDGPU: Add sdst operand to VOP2b instructionsMatt Arsenault2015-08-291-2/+2
| | | | | | | | | | The VOP3 encoding of these allows any SGPR pair for the i1 output, but this was forced before to always use vcc. This doesn't yet try to use this, but does add the operand to the definitions so the main change is adding vcc to the output of the VOP2 encoding. llvm-svn: 246358
* AMDGPU/SI: Fix read2 merging into a super register.Matt Arsenault2015-07-141-1/+1
| | | | | | | | | | | | | | | | If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies, so we only ever write to a vreg_64. Run the register coalescer again to clean this up, although this isn't ideal and often does result in an extra move. Also remove the assert that offset1 > offset0. There isn't a real reason to not allow this other than a minor convenience in the compiler, and it doesn't seem worth the effort of avoiding it. llvm-svn: 242174
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+272
llvm-svn: 239657
OpenPOWER on IntegriCloud