diff options
author | Matt Arsenault <Matthew.Arsenault@amd.com> | 2015-11-24 12:05:03 +0000 |
---|---|---|
committer | Matt Arsenault <Matthew.Arsenault@amd.com> | 2015-11-24 12:05:03 +0000 |
commit | 4d801cd357c74bb7c2a60fedf4030b9fb5b4827f (patch) | |
tree | ab89b79da213d3d358dbdaf7b6ba19ba2fe2d994 /llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | |
parent | 9d0f44bf8af57cbe992edada1a5351881b1388b2 (diff) | |
download | bcm5719-llvm-4d801cd357c74bb7c2a60fedf4030b9fb5b4827f.tar.gz bcm5719-llvm-4d801cd357c74bb7c2a60fedf4030b9fb5b4827f.zip |
AMDGPU: Split x8 and x16 vector loads instead of scalarize
The one regression in the builtin tests is in the read2 test which now
(again) has many extra copies, but this should be solved once the pass
is replaced with a DAG combine.
llvm-svn: 253974
Diffstat (limited to 'llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp')
-rw-r--r-- | llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp index af9fcbde9f1..b73172cec26 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp @@ -394,6 +394,16 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM, setFsqrtIsCheap(true); + // We want to find all load dependencies for long chains of stores to enable + // merging into very wide vectors. The problem is with vectors with > 4 + // elements. MergeConsecutiveStores will attempt to merge these because x8/x16 + // vectors are a legal type, even though we have to split the loads + // usually. When we can more precisely specify load legality per address + // space, we should be able to make FindBetterChain/MergeConsecutiveStores + // smarter so that they can figure out what to do in 2 iterations without all + // N > 4 stores on the same chain. + GatherAllAliasesMaxDepth = 16; + // FIXME: Need to really handle these. MaxStoresPerMemcpy = 4096; MaxStoresPerMemmove = 4096; |