AMDGPU: Divergence-driven selection of scalar buffer load intrinsics

Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050
author: Nicolai Haehnle <nhaehnle@gmail.com> 2018-11-30 22:55:38 +0000
committer: Nicolai Haehnle <nhaehnle@gmail.com> 2018-11-30 22:55:38 +0000
commit: a7b00058e05f6862d4ef2c8f8bb287b09f7e41b1 (patch)
tree: 3f571b7d7ba5368d8ca4dc8010ef04ffe0ee6eef /llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
parent: a9cc92c247ce5d0ecc3399e7af6e40a3d59bbf6c (diff)
download: bcm5719-llvm-a7b00058e05f6862d4ef2c8f8bb287b09f7e41b1.tar.gz
bcm5719-llvm-a7b00058e05f6862d4ef2c8f8bb287b09f7e41b1.zip
1 files changed, 5 insertions, 2 deletions
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 1cce9812dd1..bbcb73dcbb5 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -908,9 +908,12 @@ bool isLegalSMRDImmOffset(const MCSubtargetInfo &ST, int64_t ByteOffset) {
 // Given Imm, split it into the values to put into the SOffset and ImmOffset
 // fields in an MUBUF instruction. Return false if it is not possible (due to a
 // hardware bug needing a workaround).
+//
+// The required alignment ensures that individual address components remain
+// aligned if they are aligned to begin with. It also ensures that additional
+// offsets within the given alignment can be added to the resulting ImmOffset.
 bool splitMUBUFOffset(uint32_t Imm, uint32_t &SOffset, uint32_t &ImmOffset,
-                      const GCNSubtarget *Subtarget) {
-  const uint32_t Align = 4;
+                      const GCNSubtarget *Subtarget, uint32_t Align) {
   const uint32_t MaxImm = alignDown(4095, Align);
   uint32_t Overflow = 0;
author	Nicolai Haehnle <nhaehnle@gmail.com>	2018-11-30 22:55:38 +0000
committer	Nicolai Haehnle <nhaehnle@gmail.com>	2018-11-30 22:55:38 +0000
commit	a7b00058e05f6862d4ef2c8f8bb287b09f7e41b1 (patch)
tree	3f571b7d7ba5368d8ca4dc8010ef04ffe0ee6eef /llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
parent	a9cc92c247ce5d0ecc3399e7af6e40a3d59bbf6c (diff)
download	bcm5719-llvm-a7b00058e05f6862d4ef2c8f8bb287b09f7e41b1.tar.gz bcm5719-llvm-a7b00058e05f6862d4ef2c8f8bb287b09f7e41b1.zip