diff options
author | Sanjay Patel <spatel@rotateright.com> | 2017-06-22 18:11:19 +0000 |
---|---|---|
committer | Sanjay Patel <spatel@rotateright.com> | 2017-06-22 18:11:19 +0000 |
commit | 41a34e411164468818aad59000f41fce9ccbe018 (patch) | |
tree | 8dd2473af7ff53859709d719b54cbdf4e8d102e2 /llvm/lib/Target/X86/X86ISelLowering.cpp | |
parent | 58ad080ef00adc7bf05605a1bb0c432de51068d4 (diff) | |
download | bcm5719-llvm-41a34e411164468818aad59000f41fce9ccbe018.tar.gz bcm5719-llvm-41a34e411164468818aad59000f41fce9ccbe018.zip |
[x86] add/sub (X==0) --> sbb(neg X)
Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480),
lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb'
codegen in 1 out of the 4 tests. This is a step towards smoothing that out.
First, show that all of these IR forms are equivalent:
http://rise4fun.com/Alive/mx
Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge
(later Intel and AMD chips are similar based on Agner's tables):
This is the "obvious" x86 codegen (what gcc appears to produce currently):
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1* | | | | | | | | xor eax, eax
| 1 | 1.0 | | | | | | CP | test edi, edi
| 1 | | | | | | 1.0 | CP | setnz al
| 1 | | 1.0 | | | | | CP | neg eax
This is the adc version:
| 1* | | | | | | | | xor eax, eax
| 1 | 1.0 | | | | | | CP | cmp edi, 0x1
| 2 | | 1.0 | | | | 1.0 | CP | adc eax, 0xffffffff
And this is sbb:
| 1 | 1.0 | | | | | | | neg edi
| 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax
If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be
clearly better than the alternatives going forward.
llvm-svn: 306040
Diffstat (limited to 'llvm/lib/Target/X86/X86ISelLowering.cpp')
-rw-r--r-- | llvm/lib/Target/X86/X86ISelLowering.cpp | 22 |
1 files changed, 19 insertions, 3 deletions
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 679fa81f7f8..2b51137901f 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -34897,13 +34897,29 @@ static SDValue combineAddOrSubToADCOrSBB(SDNode *N, SelectionDAG &DAG) { !Cmp.getOperand(0).getValueType().isInteger()) return SDValue(); - // (cmp Z, 1) sets the carry flag if Z is 0. SDValue Z = Cmp.getOperand(0); + SDVTList VTs = DAG.getVTList(N->getValueType(0), MVT::i32); + + // If X is -1 or 0, then we have an opportunity to avoid constants required by + // the cmp transform below. 'neg' sets the carry flag when Z != 0, so create 0 + // or -1 using 'sbb' with fake operands: + // 0 - (Z != 0) --> sbb %eax, %eax, (neg Z) + // -1 + (Z == 0) --> sbb %eax, %eax, (neg Z) + if (auto *ConstantX = dyn_cast<ConstantSDNode>(X)) { + if ((IsSub && CC == X86::COND_NE && ConstantX->isNullValue()) || + (!IsSub && CC == X86::COND_E && ConstantX->isAllOnesValue())) { + SDValue Zero = DAG.getConstant(0, DL, VT); + SDValue Neg = DAG.getNode(X86ISD::SUB, DL, VTs, Zero, Z); + return DAG.getNode(X86ISD::SETCC_CARRY, DL, VT, + DAG.getConstant(X86::COND_B, DL, MVT::i8), + SDValue(Neg.getNode(), 1)); + } + } + + // (cmp Z, 1) sets the carry flag if Z is 0. SDValue NewCmp = DAG.getNode(X86ISD::CMP, DL, MVT::i32, Z, DAG.getConstant(1, DL, Z.getValueType())); - SDVTList VTs = DAG.getVTList(N->getValueType(0), MVT::i32); - // X - (Z != 0) --> sub X, (zext(setne Z, 0)) --> adc X, -1, (cmp Z, 1) // X + (Z != 0) --> add X, (zext(setne Z, 0)) --> sbb X, -1, (cmp Z, 1) if (CC == X86::COND_NE) |