summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
diff options
context:
space:
mode:
authorSanjay Patel <spatel@rotateright.com>2018-10-10 13:39:59 +0000
committerSanjay Patel <spatel@rotateright.com>2018-10-10 13:39:59 +0000
commit6cca8af2270be8bc5494b44bb8856af591d0385b (patch)
treef875071a18f0d27152814e086f606837ff94485f /llvm/lib/Target/X86
parentfc51490baf1d6ad5796d8cb8bb0792de13ce8fce (diff)
downloadbcm5719-llvm-6cca8af2270be8bc5494b44bb8856af591d0385b.tar.gz
bcm5719-llvm-6cca8af2270be8bc5494b44bb8856af591d0385b.zip
[x86] allow single source horizontal op matching (PR39195)
This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in: rL343727 As noted in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature bit to deal with it here in the DAG. Differential Revision: https://reviews.llvm.org/D52997 llvm-svn: 344141
Diffstat (limited to 'llvm/lib/Target/X86')
-rw-r--r--llvm/lib/Target/X86/X86ISelLowering.cpp8
1 files changed, 6 insertions, 2 deletions
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 4c18c5a84c2..67f98d8ee72 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -37026,9 +37026,13 @@ static bool isHorizontalBinOp(SDValue &LHS, SDValue &RHS, bool IsCommutative) {
continue;
// The low half of the 128-bit result must choose from A.
- // The high half of the 128-bit result must choose from B.
+ // The high half of the 128-bit result must choose from B,
+ // unless B is undef. In that case, we are always choosing from A.
+ // TODO: Using a horizontal op on a single input is likely worse for
+ // performance on many CPUs, so this should be limited here or reversed
+ // in a later pass.
unsigned NumEltsPer64BitChunk = NumEltsPer128BitChunk / 2;
- unsigned Src = i >= NumEltsPer64BitChunk;
+ unsigned Src = B.getNode() ? i >= NumEltsPer64BitChunk : 0;
// Check that successive elements are being operated on. If not, this is
// not a horizontal operation.
OpenPOWER on IntegriCloud