diff options
| author | Sanjay Patel <spatel@rotateright.com> | 2019-06-04 16:40:04 +0000 |
|---|---|---|
| committer | Sanjay Patel <spatel@rotateright.com> | 2019-06-04 16:40:04 +0000 |
| commit | 606eb2367f9f0bef2d1e0182bbb2bf4effb1711e (patch) | |
| tree | 07ad29ff737cfeb198014fa795f057a9150954e6 /llvm/lib/Target | |
| parent | f15e3d856fddd3ecf80fdbb798be64d0c4bc6de4 (diff) | |
| download | bcm5719-llvm-606eb2367f9f0bef2d1e0182bbb2bf4effb1711e.tar.gz bcm5719-llvm-606eb2367f9f0bef2d1e0182bbb2bf4effb1711e.zip | |
[x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
Differential Revision: https://reviews.llvm.org/D62498
llvm-svn: 362524
Diffstat (limited to 'llvm/lib/Target')
| -rw-r--r-- | llvm/lib/Target/X86/X86ISelLowering.cpp | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index e493d3d7194..a15e3753820 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -1283,6 +1283,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(ISD::SCALAR_TO_VECTOR, VT, Custom); setOperationAction(ISD::INSERT_SUBVECTOR, VT, Legal); setOperationAction(ISD::CONCAT_VECTORS, VT, Custom); + setOperationAction(ISD::STORE, VT, Custom); } if (HasInt256) @@ -21073,7 +21074,17 @@ static SDValue LowerStore(SDValue Op, const X86Subtarget &Subtarget, if (St->isTruncatingStore()) return SDValue(); + // If this is a 256-bit store of concatenated ops, we are better off splitting + // that store into two 128-bit stores. This avoids spurious use of 256-bit ops + // and each half can execute independently. Some cores would split the op into + // halves anyway, so the concat (vinsertf128) is purely an extra op. MVT StoreVT = StoredVal.getSimpleValueType(); + if (StoreVT.is256BitVector()) { + if (StoredVal.getOpcode() != ISD::CONCAT_VECTORS || !StoredVal.hasOneUse()) + return SDValue(); + return split256BitStore(St, DAG); + } + assert(StoreVT.isVector() && StoreVT.getSizeInBits() == 64 && "Unexpected VT"); if (DAG.getTargetLoweringInfo().getTypeAction(*DAG.getContext(), StoreVT) != |

