diff options
| author | Jingyue Wu <jingyue@google.com> | 2015-07-01 20:08:06 +0000 |
|---|---|---|
| committer | Jingyue Wu <jingyue@google.com> | 2015-07-01 20:08:06 +0000 |
| commit | 77b5b385ee4f727a166eb5bd5175e396a17c1407 (patch) | |
| tree | 268672316de1cab7e9e6a6a62e11f5dee9e0b76c /llvm/test | |
| parent | d4cb1dddebe5a65672d62fe379956f7324749fa4 (diff) | |
| download | bcm5719-llvm-77b5b385ee4f727a166eb5bd5175e396a17c1407.tar.gz bcm5719-llvm-77b5b385ee4f727a166eb5bd5175e396a17c1407.zip | |
[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.
Patched by Xuetian Weng.
Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10853
llvm-svn: 241185
Diffstat (limited to 'llvm/test')
| -rw-r--r-- | llvm/test/CodeGen/NVPTX/local-stack-frame.ll | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/llvm/test/CodeGen/NVPTX/local-stack-frame.ll b/llvm/test/CodeGen/NVPTX/local-stack-frame.ll index fba5dd883f9..ef1b7da6ad0 100644 --- a/llvm/test/CodeGen/NVPTX/local-stack-frame.ll +++ b/llvm/test/CodeGen/NVPTX/local-stack-frame.ll @@ -59,10 +59,16 @@ define void @foo3(i32 %a) { ; PTX32: cvta.local.u32 %SP, %SPL; ; PTX32: add.u32 {{%r[0-9]+}}, %SP, 0; +; PTX32: add.u32 {{%r[0-9]+}}, %SPL, 0; +; PTX32: add.u32 {{%r[0-9]+}}, %SP, 4; +; PTX32: add.u32 {{%r[0-9]+}}, %SPL, 4; ; PTX32: st.local.u32 [{{%r[0-9]+}}], {{%r[0-9]+}} ; PTX32: st.local.u32 [{{%r[0-9]+}}], {{%r[0-9]+}} ; PTX64: cvta.local.u64 %SP, %SPL; ; PTX64: add.u64 {{%rd[0-9]+}}, %SP, 0; +; PTX64: add.u64 {{%rd[0-9]+}}, %SPL, 0; +; PTX64: add.u64 {{%rd[0-9]+}}, %SP, 4; +; PTX64: add.u64 {{%rd[0-9]+}}, %SPL, 4; ; PTX64: st.local.u32 [{{%rd[0-9]+}}], {{%r[0-9]+}} ; PTX64: st.local.u32 [{{%rd[0-9]+}}], {{%r[0-9]+}} define void @foo4() { |

