summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/BPF/32-bit-subreg-peephole.ll
Commit message (Collapse)AuthorAgeFilesLines
* [BPF] Fix a bug in peephole optimizationYonghong Song2019-11-201-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of current peephole optimiations is to remove SLL/SRL if the sub register has been zero extended. This phase has two bugs and one limitations. First, for the physical subregister used in pseudo insn COPY like below, it permits incorrect optimization. %0:gpr32 = COPY $w0 ... %4:gpr = MOV_32_64 %0:gpr32 %5:gpr = SLL_ri %4:gpr(tied-def 0), 32 %6:gpr = SRA_ri %5:gpr(tied-def 0), 32 The $w0 could be from the return value of a previous function call and its upper 32-bit value might contain some non-zero values. The same applies to function arguments. Second, the current code may permits removing SLL/SRA like below: %0:gpr32 = COPY $w0 %1:gpr32 = COPY %0:gpr32 ... %4:gpr = MOV_32_64 %1:gpr32 %5:gpr = SLL_ri %4:gpr(tied-def 0), 32 %6:gpr = SRA_ri %5:gpr(tied-def 0), 32 The reason is that it did not follow def-use chain to skip all intermediate 32bit-to-32bit COPY instructions. The current implementation is also very conservative for PHI instructions. If any PHI insn component is another PHI or COPY insn, it will just permit SLL/SRA. This patch fixed the issue as follows: - During def/use chain traversal, if any physical register is read, SLL/SRA will be preserved as these physical registers are mostly from function return values or current function arguments. - Recursively visit all COPY and PHI instructions.
* bpf: fix incorrect SELECT_CC loweringYonghong Song2018-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043
* bpf: Extends zero extension elimination beyond comparison instructionsYonghong Song2018-03-131-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | The current zero extension elimination was restricted to operands of comparison. It actually could be extended to more cases. For example: int *inc_p (int *p, unsigned a) { return p + a; } 'a' will be promoted to i64 during addition, and the zero extension could be eliminated as well. For the elimination optimization, it should be much better to start recognizing the candidate sequence from the SRL instruction instead of J* instructions. This patch makes it an generic zero extension elimination pass instead of one restricted with comparison. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327367
* bpf: J*_RR should check both operandsYonghong Song2018-03-131-0/+20
| | | | | | | | | | | | | There is a mistake in current code that we "break" out the optimization when the first operand of J*_RR doesn't qualify the elimination. This caused some elimination opportunities missed, for example the one in the testcase. The code should just fall through to handle the second operand. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327366
* bpf: Tighten subregister definition checkYonghong Song2018-03-131-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current subregister definition check stops after the MOV_32_64 instruction. This means we are thinking all the following instruction sequences are safe to be eliminated: MOV_32_64 rB, wA SLL_ri rB, rB, 32 SRL_ri rB, rB, 32 However, this is *not* true. The source subregister wA of MOV_32_64 could come from a implicit truncation of 64-bit register in which case the high bits of the 64-bit register is not zeroed, therefore we can't eliminate above sequence. For example, for i32_val, we shouldn't do the elimination: long long bar (); int foo (int b, int c) { unsigned int i32_val = (unsigned int) bar(); if (i32_val < 10) return b; else return c; } Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327365
* bpf: Add more check directives in peephole testcaseYonghong Song2018-03-131-0/+4
| | | | | | | | | | | Improve the test accuracy by adding more check directives. Shifts are expected to be eliminated for zero extension but not for signed extension. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327364
* bpf: New codegen testcases for 32-bit subregister supportYonghong Song2018-02-231-0/+36
This patch adds some unit tests for 32-bit subregister support. We want to make sure ALU32, subregister load/store and new peephole optimization are truely enabled once -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325992
OpenPOWER on IntegriCloud