[x86] split 256-bit store of concatenated vectors

This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524
author: Sanjay Patel <spatel@rotateright.com> 2019-06-04 16:40:04 +0000
committer: Sanjay Patel <spatel@rotateright.com> 2019-06-04 16:40:04 +0000
commit: 606eb2367f9f0bef2d1e0182bbb2bf4effb1711e (patch)
tree: 07ad29ff737cfeb198014fa795f057a9150954e6 /llvm/lib/Target
parent: f15e3d856fddd3ecf80fdbb798be64d0c4bc6de4 (diff)
download: bcm5719-llvm-606eb2367f9f0bef2d1e0182bbb2bf4effb1711e.tar.gz
bcm5719-llvm-606eb2367f9f0bef2d1e0182bbb2bf4effb1711e.zip
1 files changed, 11 insertions, 0 deletions
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index e493d3d7194..a15e3753820 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -1283,6 +1283,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
       setOperationAction(ISD::SCALAR_TO_VECTOR,   VT, Custom);
       setOperationAction(ISD::INSERT_SUBVECTOR,   VT, Legal);
       setOperationAction(ISD::CONCAT_VECTORS,     VT, Custom);
+      setOperationAction(ISD::STORE,              VT, Custom);
     }
 
     if (HasInt256)
@@ -21073,7 +21074,17 @@ static SDValue LowerStore(SDValue Op, const X86Subtarget &Subtarget,
   if (St->isTruncatingStore())
     return SDValue();
 
+  // If this is a 256-bit store of concatenated ops, we are better off splitting
+  // that store into two 128-bit stores. This avoids spurious use of 256-bit ops
+  // and each half can execute independently. Some cores would split the op into
+  // halves anyway, so the concat (vinsertf128) is purely an extra op.
   MVT StoreVT = StoredVal.getSimpleValueType();
+  if (StoreVT.is256BitVector()) {
+    if (StoredVal.getOpcode() != ISD::CONCAT_VECTORS || !StoredVal.hasOneUse())
+      return SDValue();
+    return split256BitStore(St, DAG);
+  }
+
   assert(StoreVT.isVector() && StoreVT.getSizeInBits() == 64 &&
          "Unexpected VT");
   if (DAG.getTargetLoweringInfo().getTypeAction(*DAG.getContext(), StoreVT) !=
author	Sanjay Patel <spatel@rotateright.com>	2019-06-04 16:40:04 +0000
committer	Sanjay Patel <spatel@rotateright.com>	2019-06-04 16:40:04 +0000
commit	606eb2367f9f0bef2d1e0182bbb2bf4effb1711e (patch)
tree	07ad29ff737cfeb198014fa795f057a9150954e6 /llvm/lib/Target
parent	f15e3d856fddd3ecf80fdbb798be64d0c4bc6de4 (diff)
download	bcm5719-llvm-606eb2367f9f0bef2d1e0182bbb2bf4effb1711e.tar.gz bcm5719-llvm-606eb2367f9f0bef2d1e0182bbb2bf4effb1711e.zip