summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
diff options
context:
space:
mode:
authorReid Kleckner <rnk@google.com>2018-03-14 21:54:21 +0000
committerReid Kleckner <rnk@google.com>2018-03-14 21:54:21 +0000
commit3a7a2e4a0ae44b6a035b09a95babe2bc6c646323 (patch)
tree170fd9cc4a0db0c5a54d565791a1fe4bac31a365 /llvm/test/CodeGen/AArch64
parente85b06d65f695c576df1e529ee37cb15e6902401 (diff)
downloadbcm5719-llvm-3a7a2e4a0ae44b6a035b09a95babe2bc6c646323.tar.gz
bcm5719-llvm-3a7a2e4a0ae44b6a035b09a95babe2bc6c646323.zip
[FastISel] Sink local value materializations to first use
Summary: Local values are constants, global addresses, and stack addresses that can't be folded into the instruction that uses them. For example, when storing the address of a global variable into memory, we need to materialize that address into a register. FastISel doesn't want to materialize any given local value more than once, so it generates all local value materialization code at EmitStartPt, which always dominates the current insertion point. This allows it to maintain a map of local value registers, and it knows that the local value area will always dominate the current insertion point. The downside is that local value instructions are always emitted without a source location. This is done to prevent jumpy line tables, but it means that the local value area will be considered part of the previous statement. Consider this C code: call1(); // line 1 ++global; // line 2 ++global; // line 3 call2(&global, &local); // line 4 Today we end up with assembly and line tables like this: .loc 1 1 callq call1 leaq global(%rip), %rdi leaq local(%rsp), %rsi .loc 1 2 addq $1, global(%rip) .loc 1 3 addq $1, global(%rip) .loc 1 4 callq call2 The LEA instructions in the local value area have no source location and are treated as being on line 1. Stepping through the code in a debugger and correlating it with the assembly won't make much sense, because these materializations are only required for line 4. This is actually problematic for the VS debugger "set next statement" feature, which effectively assumes that there are no registers live across statement boundaries. By sinking the local value code into the statement and fixing up the source location, we can make that feature work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and https://crbug.com/793819. This change is obviously not enough to make this feature work reliably in all cases, but I felt that it was worth doing anyway because it usually generates smaller, more comprehensible -O0 code. I measured a 0.12% regression in code generation time with LLC on the sqlite3 amalgamation, so I think this is worth doing. There are some special cases worth calling out in the commit message: 1. local values materialized for phis 2. local values used by no-op casts 3. dead local value code Local values can be materialized for phis, and this does not show up as a vreg use in MachineRegisterInfo. In this case, if there are no other uses, this patch sinks the value to the first terminator, EH label, or the end of the BB if nothing else exists. Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. Lastly, if the local value register has no other uses, we can delete it. This comes up when fastisel tries two instruction selection approaches and the first materializes the value but fails and the second succeeds without using the local value. Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43093 llvm-svn: 327581
Diffstat (limited to 'llvm/test/CodeGen/AArch64')
-rw-r--r--llvm/test/CodeGen/AArch64/arm64-abi_align.ll30
-rw-r--r--llvm/test/CodeGen/AArch64/arm64-fast-isel-call.ll8
-rw-r--r--llvm/test/CodeGen/AArch64/arm64-fast-isel-gv.ll4
-rw-r--r--llvm/test/CodeGen/AArch64/arm64-fast-isel-intrinsic.ll2
-rw-r--r--llvm/test/CodeGen/AArch64/arm64-fast-isel.ll2
-rw-r--r--llvm/test/CodeGen/AArch64/arm64-patchpoint-webkit_jscc.ll12
-rw-r--r--llvm/test/CodeGen/AArch64/swifterror.ll4
7 files changed, 34 insertions, 28 deletions
diff --git a/llvm/test/CodeGen/AArch64/arm64-abi_align.ll b/llvm/test/CodeGen/AArch64/arm64-abi_align.ll
index bfb74b598ff..25a747caa41 100644
--- a/llvm/test/CodeGen/AArch64/arm64-abi_align.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-abi_align.ll
@@ -290,13 +290,14 @@ entry:
; Space for s2 is allocated at sp
; FAST-LABEL: caller42
-; FAST: sub sp, sp, #112
-; Space for s1 is allocated at fp-24 = sp+72
-; Space for s2 is allocated at sp+48
+; FAST: sub sp, sp, #96
+; Space for s1 is allocated at fp-24 = sp+56
; FAST: sub x[[A:[0-9]+]], x29, #24
-; FAST: add x[[A:[0-9]+]], sp, #48
; Call memcpy with size = 24 (0x18)
; FAST: orr {{x[0-9]+}}, xzr, #0x18
+; Space for s2 is allocated at sp+32
+; FAST: add x[[A:[0-9]+]], sp, #32
+; FAST: bl _memcpy
%tmp = alloca %struct.s42, align 4
%tmp1 = alloca %struct.s42, align 4
%0 = bitcast %struct.s42* %tmp to i8*
@@ -334,13 +335,16 @@ entry:
; FAST-LABEL: caller42_stack
; Space for s1 is allocated at fp-24
-; Space for s2 is allocated at fp-48
; FAST: sub x[[A:[0-9]+]], x29, #24
-; FAST: sub x[[B:[0-9]+]], x29, #48
; Call memcpy with size = 24 (0x18)
; FAST: orr {{x[0-9]+}}, xzr, #0x18
-; FAST: str {{w[0-9]+}}, [sp]
+; FAST: bl _memcpy
+; Space for s2 is allocated at fp-48
+; FAST: sub x[[B:[0-9]+]], x29, #48
+; Call memcpy again
+; FAST: bl _memcpy
; Address of s1 is passed on stack at sp+8
+; FAST: str {{w[0-9]+}}, [sp]
; FAST: str {{x[0-9]+}}, [sp, #8]
; FAST: str {{x[0-9]+}}, [sp, #16]
%tmp = alloca %struct.s42, align 4
@@ -401,8 +405,6 @@ entry:
; FAST: add x29, sp, #64
; Space for s1 is allocated at sp+32
; Space for s2 is allocated at sp
-; FAST: add x1, sp, #32
-; FAST: mov x2, sp
; FAST: str {{x[0-9]+}}, [sp, #32]
; FAST: str {{x[0-9]+}}, [sp, #40]
; FAST: str {{x[0-9]+}}, [sp, #48]
@@ -411,6 +413,8 @@ entry:
; FAST: str {{x[0-9]+}}, [sp, #8]
; FAST: str {{x[0-9]+}}, [sp, #16]
; FAST: str {{x[0-9]+}}, [sp, #24]
+; FAST: add x1, sp, #32
+; FAST: mov x2, sp
%tmp = alloca %struct.s43, align 16
%tmp1 = alloca %struct.s43, align 16
%0 = bitcast %struct.s43* %tmp to i8*
@@ -448,8 +452,6 @@ entry:
; FAST: sub sp, sp, #112
; Space for s1 is allocated at fp-32 = sp+64
; Space for s2 is allocated at sp+32
-; FAST: sub x[[A:[0-9]+]], x29, #32
-; FAST: add x[[B:[0-9]+]], sp, #32
; FAST: stur {{x[0-9]+}}, [x29, #-32]
; FAST: stur {{x[0-9]+}}, [x29, #-24]
; FAST: stur {{x[0-9]+}}, [x29, #-16]
@@ -460,8 +462,10 @@ entry:
; FAST: str {{x[0-9]+}}, [sp, #56]
; FAST: str {{w[0-9]+}}, [sp]
; Address of s1 is passed on stack at sp+8
-; FAST: str {{x[0-9]+}}, [sp, #8]
-; FAST: str {{x[0-9]+}}, [sp, #16]
+; FAST: sub x[[A:[0-9]+]], x29, #32
+; FAST: str x[[A]], [sp, #8]
+; FAST: add x[[B:[0-9]+]], sp, #32
+; FAST: str x[[B]], [sp, #16]
%tmp = alloca %struct.s43, align 16
%tmp1 = alloca %struct.s43, align 16
%0 = bitcast %struct.s43* %tmp to i8*
diff --git a/llvm/test/CodeGen/AArch64/arm64-fast-isel-call.ll b/llvm/test/CodeGen/AArch64/arm64-fast-isel-call.ll
index 4cf23545aab..1f34cef0817 100644
--- a/llvm/test/CodeGen/AArch64/arm64-fast-isel-call.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-fast-isel-call.ll
@@ -80,15 +80,15 @@ define i32 @t2() {
entry:
; CHECK-LABEL: t2
; CHECK: mov [[REG1:x[0-9]+]], xzr
+; CHECK: mov x0, [[REG1]]
; CHECK: orr w1, wzr, #0xfffffff8
; CHECK: orr [[REG2:w[0-9]+]], wzr, #0x3ff
-; CHECK: orr [[REG3:w[0-9]+]], wzr, #0x2
-; CHECK: mov [[REG4:w[0-9]+]], wzr
-; CHECK: orr [[REG5:w[0-9]+]], wzr, #0x1
-; CHECK: mov x0, [[REG1]]
; CHECK: uxth w2, [[REG2]]
+; CHECK: orr [[REG3:w[0-9]+]], wzr, #0x2
; CHECK: sxtb w3, [[REG3]]
+; CHECK: mov [[REG4:w[0-9]+]], wzr
; CHECK: and w4, [[REG4]], #0x1
+; CHECK: orr [[REG5:w[0-9]+]], wzr, #0x1
; CHECK: and w5, [[REG5]], #0x1
; CHECK: bl _func2
%call = call i32 @func2(i64 zeroext 0, i32 signext -8, i16 zeroext 1023, i8 signext -254, i1 zeroext 0, i1 zeroext 1)
diff --git a/llvm/test/CodeGen/AArch64/arm64-fast-isel-gv.ll b/llvm/test/CodeGen/AArch64/arm64-fast-isel-gv.ll
index 00e2fab81f9..e98c7d6e855 100644
--- a/llvm/test/CodeGen/AArch64/arm64-fast-isel-gv.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-fast-isel-gv.ll
@@ -18,10 +18,10 @@ entry:
; CHECK: @Rand
; CHECK: adrp [[REG1:x[0-9]+]], _seed@GOTPAGE
; CHECK: ldr [[REG2:x[0-9]+]], {{\[}}[[REG1]], _seed@GOTPAGEOFF{{\]}}
-; CHECK: mov [[REG3:x[0-9]+]], #13849
-; CHECK: mov [[REG4:x[0-9]+]], #1309
; CHECK: ldr [[REG5:x[0-9]+]], {{\[}}[[REG2]]{{\]}}
+; CHECK: mov [[REG4:x[0-9]+]], #1309
; CHECK: mul [[REG6:x[0-9]+]], [[REG5]], [[REG4]]
+; CHECK: mov [[REG3:x[0-9]+]], #13849
; CHECK: add [[REG7:x[0-9]+]], [[REG6]], [[REG3]]
; CHECK: and [[REG8:x[0-9]+]], [[REG7]], #0xffff
; CHECK: str [[REG8]], {{\[}}[[REG1]]{{\]}}
diff --git a/llvm/test/CodeGen/AArch64/arm64-fast-isel-intrinsic.ll b/llvm/test/CodeGen/AArch64/arm64-fast-isel-intrinsic.ll
index e43160ab340..ce99afa4215 100644
--- a/llvm/test/CodeGen/AArch64/arm64-fast-isel-intrinsic.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-fast-isel-intrinsic.ll
@@ -8,8 +8,8 @@ define void @t1() {
; ARM64: adrp x8, _message@PAGE
; ARM64: add x0, x8, _message@PAGEOFF
; ARM64: mov w9, wzr
-; ARM64: mov x2, #80
; ARM64: uxtb w1, w9
+; ARM64: mov x2, #80
; ARM64: bl _memset
call void @llvm.memset.p0i8.i64(i8* align 16 getelementptr inbounds ([80 x i8], [80 x i8]* @message, i32 0, i32 0), i8 0, i64 80, i1 false)
ret void
diff --git a/llvm/test/CodeGen/AArch64/arm64-fast-isel.ll b/llvm/test/CodeGen/AArch64/arm64-fast-isel.ll
index 39934c4399b..fc1ed96bbfa 100644
--- a/llvm/test/CodeGen/AArch64/arm64-fast-isel.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-fast-isel.ll
@@ -95,6 +95,8 @@ declare void @llvm.trap() nounwind
define void @ands(i32* %addr) {
; CHECK-LABEL: ands:
; CHECK: tst [[COND:w[0-9]+]], #0x1
+; CHECK-NEXT: orr w{{[0-9]+}}, wzr, #0x2
+; CHECK-NEXT: orr w{{[0-9]+}}, wzr, #0x1
; CHECK-NEXT: csel [[COND]],
entry:
%cond91 = select i1 undef, i32 1, i32 2
diff --git a/llvm/test/CodeGen/AArch64/arm64-patchpoint-webkit_jscc.ll b/llvm/test/CodeGen/AArch64/arm64-patchpoint-webkit_jscc.ll
index ccd12cdf674..0b7a38b9153 100644
--- a/llvm/test/CodeGen/AArch64/arm64-patchpoint-webkit_jscc.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-patchpoint-webkit_jscc.ll
@@ -51,10 +51,10 @@ entry:
; CHECK-NEXT: blr x16
; FAST-LABEL: jscall_patchpoint_codegen2:
; FAST: orr [[REG1:x[0-9]+]], xzr, #0x2
-; FAST-NEXT: orr [[REG2:w[0-9]+]], wzr, #0x4
-; FAST-NEXT: orr [[REG3:x[0-9]+]], xzr, #0x6
; FAST-NEXT: str [[REG1]], [sp]
+; FAST-NEXT: orr [[REG2:w[0-9]+]], wzr, #0x4
; FAST-NEXT: str [[REG2]], [sp, #16]
+; FAST-NEXT: orr [[REG3:x[0-9]+]], xzr, #0x6
; FAST-NEXT: str [[REG3]], [sp, #24]
; FAST: Ltmp
; FAST-NEXT: mov x16, #281470681743360
@@ -87,14 +87,14 @@ entry:
; CHECK-NEXT: blr x16
; FAST-LABEL: jscall_patchpoint_codegen3:
; FAST: orr [[REG1:x[0-9]+]], xzr, #0x2
-; FAST-NEXT: orr [[REG2:w[0-9]+]], wzr, #0x4
-; FAST-NEXT: orr [[REG3:x[0-9]+]], xzr, #0x6
-; FAST-NEXT: orr [[REG4:w[0-9]+]], wzr, #0x8
-; FAST-NEXT: mov [[REG5:x[0-9]+]], #10
; FAST-NEXT: str [[REG1]], [sp]
+; FAST-NEXT: orr [[REG2:w[0-9]+]], wzr, #0x4
; FAST-NEXT: str [[REG2]], [sp, #16]
+; FAST-NEXT: orr [[REG3:x[0-9]+]], xzr, #0x6
; FAST-NEXT: str [[REG3]], [sp, #24]
+; FAST-NEXT: orr [[REG4:w[0-9]+]], wzr, #0x8
; FAST-NEXT: str [[REG4]], [sp, #36]
+; FAST-NEXT: mov [[REG5:x[0-9]+]], #10
; FAST-NEXT: str [[REG5]], [sp, #48]
; FAST: Ltmp
; FAST-NEXT: mov x16, #281470681743360
diff --git a/llvm/test/CodeGen/AArch64/swifterror.ll b/llvm/test/CodeGen/AArch64/swifterror.ll
index 00cf7e6f503..2de271e49f9 100644
--- a/llvm/test/CodeGen/AArch64/swifterror.ll
+++ b/llvm/test/CodeGen/AArch64/swifterror.ll
@@ -189,10 +189,10 @@ define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float
; CHECK-O0:[[BB2]]:
; CHECK-O0: ldr x0, [sp, [[SLOT2]]]
; CHECK-O0: fcmp
-; CHECK-O0: str x0, [sp]
+; CHECK-O0: str x0, [sp, [[SLOT3:#[0-9]+]]
; CHECK-O0: b.le [[BB1]]
; reload from stack
-; CHECK-O0: ldr [[ID3:x[0-9]+]], [sp]
+; CHECK-O0: ldr [[ID3:x[0-9]+]], [sp, [[SLOT3]]]
; CHECK-O0: mov x21, [[ID3]]
; CHECK-O0: ret
entry:
OpenPOWER on IntegriCloud