diff options
author | Sanjay Patel <spatel@rotateright.com> | 2017-06-22 18:11:19 +0000 |
---|---|---|
committer | Sanjay Patel <spatel@rotateright.com> | 2017-06-22 18:11:19 +0000 |
commit | 41a34e411164468818aad59000f41fce9ccbe018 (patch) | |
tree | 8dd2473af7ff53859709d719b54cbdf4e8d102e2 /llvm/test | |
parent | 58ad080ef00adc7bf05605a1bb0c432de51068d4 (diff) | |
download | bcm5719-llvm-41a34e411164468818aad59000f41fce9ccbe018.tar.gz bcm5719-llvm-41a34e411164468818aad59000f41fce9ccbe018.zip |
[x86] add/sub (X==0) --> sbb(neg X)
Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480),
lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb'
codegen in 1 out of the 4 tests. This is a step towards smoothing that out.
First, show that all of these IR forms are equivalent:
http://rise4fun.com/Alive/mx
Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge
(later Intel and AMD chips are similar based on Agner's tables):
This is the "obvious" x86 codegen (what gcc appears to produce currently):
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1* | | | | | | | | xor eax, eax
| 1 | 1.0 | | | | | | CP | test edi, edi
| 1 | | | | | | 1.0 | CP | setnz al
| 1 | | 1.0 | | | | | CP | neg eax
This is the adc version:
| 1* | | | | | | | | xor eax, eax
| 1 | 1.0 | | | | | | CP | cmp edi, 0x1
| 2 | | 1.0 | | | | 1.0 | CP | adc eax, 0xffffffff
And this is sbb:
| 1 | 1.0 | | | | | | | neg edi
| 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax
If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be
clearly better than the alternatives going forward.
llvm-svn: 306040
Diffstat (limited to 'llvm/test')
-rw-r--r-- | llvm/test/CodeGen/X86/sbb.ll | 15 |
1 files changed, 6 insertions, 9 deletions
diff --git a/llvm/test/CodeGen/X86/sbb.ll b/llvm/test/CodeGen/X86/sbb.ll index 6ff207ce35b..062c5d26247 100644 --- a/llvm/test/CodeGen/X86/sbb.ll +++ b/llvm/test/CodeGen/X86/sbb.ll @@ -8,9 +8,8 @@ define i8 @i8_select_0_or_neg1(i8 %x) { ; CHECK-LABEL: i8_select_0_or_neg1: ; CHECK: # BB#0: -; CHECK-NEXT: cmpb $1, %dil -; CHECK-NEXT: movb $-1, %al -; CHECK-NEXT: adcb $0, %al +; CHECK-NEXT: negb %dil +; CHECK-NEXT: sbbb %al, %al ; CHECK-NEXT: retq %cmp = icmp eq i8 %x, 0 %sel = select i1 %cmp, i8 0, i8 -1 @@ -22,9 +21,8 @@ define i8 @i8_select_0_or_neg1(i8 %x) { define i16 @i16_select_0_or_neg1_as_math(i16 %x) { ; CHECK-LABEL: i16_select_0_or_neg1_as_math: ; CHECK: # BB#0: -; CHECK-NEXT: cmpw $1, %di -; CHECK-NEXT: movw $-1, %ax -; CHECK-NEXT: adcw $0, %ax +; CHECK-NEXT: negw %di +; CHECK-NEXT: sbbw %ax, %ax ; CHECK-NEXT: retq %cmp = icmp eq i16 %x, 0 %ext = zext i1 %cmp to i16 @@ -50,9 +48,8 @@ define i32 @i32_select_0_or_neg1_commuted(i32 %x) { define i64 @i64_select_0_or_neg1_commuted_as_math(i64 %x) { ; CHECK-LABEL: i64_select_0_or_neg1_commuted_as_math: ; CHECK: # BB#0: -; CHECK-NEXT: xorl %eax, %eax -; CHECK-NEXT: cmpq $1, %rdi -; CHECK-NEXT: adcq $-1, %rax +; CHECK-NEXT: negq %rdi +; CHECK-NEXT: sbbq %rax, %rax ; CHECK-NEXT: retq %cmp = icmp ne i64 %x, 0 %ext = zext i1 %cmp to i64 |