summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
diff options
context:
space:
mode:
authorSanjay Patel <spatel@rotateright.com>2018-11-10 20:05:31 +0000
committerSanjay Patel <spatel@rotateright.com>2018-11-10 20:05:31 +0000
commit0a515595a712347d7070eeb6bbd51ccd7791b106 (patch)
tree50701cd3d8d5b7976b5bbb056bbd7e8a5b677823 /llvm/lib/Target
parent3482801dea1dcc12ed15fd1033a9007b1439c900 (diff)
downloadbcm5719-llvm-0a515595a712347d7070eeb6bbd51ccd7791b106.tar.gz
bcm5719-llvm-0a515595a712347d7070eeb6bbd51ccd7791b106.zip
[x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
Diffstat (limited to 'llvm/lib/Target')
-rw-r--r--llvm/lib/Target/AArch64/AArch64ISelLowering.cpp4
-rw-r--r--llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp5
-rw-r--r--llvm/lib/Target/Hexagon/HexagonISelLowering.cpp4
3 files changed, 12 insertions, 1 deletions
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 44ad6ae6cd2..92223c8e897 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -8103,6 +8103,10 @@ bool AArch64TargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
bool AArch64TargetLowering::shouldReduceLoadWidth(SDNode *Load,
ISD::LoadExtType ExtTy,
EVT NewVT) const {
+ // TODO: This may be worth removing. Check regression tests for diffs.
+ if (!TargetLoweringBase::shouldReduceLoadWidth(Load, ExtTy, NewVT))
+ return false;
+
// If we're reducing the load width in order to avoid having to use an extra
// instruction to do extension then it's probably a good idea.
if (ExtTy != ISD::NON_EXTLOAD)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index ad0a9e388af..f5894c9022a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -655,8 +655,11 @@ bool AMDGPUTargetLowering::ShouldShrinkFPConstant(EVT VT) const {
}
bool AMDGPUTargetLowering::shouldReduceLoadWidth(SDNode *N,
- ISD::LoadExtType,
+ ISD::LoadExtType ExtTy,
EVT NewVT) const {
+ // TODO: This may be worth removing. Check regression tests for diffs.
+ if (!TargetLoweringBase::shouldReduceLoadWidth(N, ExtTy, NewVT))
+ return false;
unsigned NewSize = NewVT.getStoreSizeInBits();
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
index 755a8539be7..dc2a4eff204 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
@@ -3082,6 +3082,10 @@ HexagonTargetLowering::findRepresentativeClass(const TargetRegisterInfo *TRI,
bool HexagonTargetLowering::shouldReduceLoadWidth(SDNode *Load,
ISD::LoadExtType ExtTy, EVT NewVT) const {
+ // TODO: This may be worth removing. Check regression tests for diffs.
+ if (!TargetLoweringBase::shouldReduceLoadWidth(Load, ExtTy, NewVT))
+ return false;
+
auto *L = cast<LoadSDNode>(Load);
std::pair<SDValue,int> BO = getBaseAndOffset(L->getBasePtr());
// Small-data object, do not shrink.
OpenPOWER on IntegriCloud