summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/combine-or.ll
diff options
context:
space:
mode:
authorAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>2014-06-25 17:41:58 +0000
committerAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>2014-06-25 17:41:58 +0000
commit07cdffc324b8fd336b04e8a7f83d7c3ce1ba7815 (patch)
treed717e6b4dc4e7d246c638db18be77822343093d6 /llvm/test/CodeGen/X86/combine-or.ll
parenta826147eef05d1d4fa0f3efdd1eeb4c71ae13b9d (diff)
downloadbcm5719-llvm-07cdffc324b8fd336b04e8a7f83d7c3ce1ba7815.tar.gz
bcm5719-llvm-07cdffc324b8fd336b04e8a7f83d7c3ce1ba7815.zip
[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128).
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to the check for 'isBlendMask'; the idea is that, when possible, we should firstly check if a shuffle performs a blend, and in case, try to lower it into a BLENDI instead of selecting a SHUFP or (worse) a VPERM2X128. In general: - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128; - BLENDPS/D instructions tend to always have better 'reciprocal throughput' than the equivalent SHUFPS/D; - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more than one execution port. This patch: - Moves the check for 'isBlendMask' immediately before the check for 'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE'; - Updates existing tests for sse/avx shuffle/blend instructions to verify that we select (v)blendps/d when possible (instead of (v)shufps/d or vperm2f128). llvm-svn: 211720
Diffstat (limited to 'llvm/test/CodeGen/X86/combine-or.ll')
-rw-r--r--llvm/test/CodeGen/X86/combine-or.ll4
1 files changed, 2 insertions, 2 deletions
diff --git a/llvm/test/CodeGen/X86/combine-or.ll b/llvm/test/CodeGen/X86/combine-or.ll
index 572aded5e9a..ff807b98717 100644
--- a/llvm/test/CodeGen/X86/combine-or.ll
+++ b/llvm/test/CodeGen/X86/combine-or.ll
@@ -74,7 +74,7 @@ define <4 x i32> @test6(<4 x i32> %a, <4 x i32> %b) {
}
; CHECK-LABEL: test6
; CHECK-NOT: xorps
-; CHECK: shufps
+; CHECK: blendps $12
; CHECK-NEXT: ret
@@ -86,7 +86,7 @@ define <4 x i32> @test7(<4 x i32> %a, <4 x i32> %b) {
}
; CHECK-LABEL: test7
; CHECK-NOT: xorps
-; CHECK: shufps
+; CHECK: blendps $12
; CHECK-NEXT: ret
OpenPOWER on IntegriCloud