bcm5719-llvm/llvm/test/CodeGen/BPF, branch meklort-10.0.1

[BPF] fix incorrect type in BPFISelDAGToDAG readonly load optimization

2020-06-23T21:43:44+00:00

In BPF Instruction Selection DAGToDAG transformation phase, BPF backend had an optimization to turn load from readonly data section to direct load of the values. This phase is implemented before libbpf has readonly section support and before alu32 is supported. This phase however may generate incorrect type when alu32 is enabled. The following is an example, -bash-4.4$ cat ~/tmp2/t.c struct t { unsigned char a; unsigned char b; unsigned char c; }; extern void foo(void *); int test() { struct t v = { .b = 2, }; foo(&v); return 0; } The compiler will turn local variable "v" into a readonly section. During instruction selection phase, the compiler generates two loads from readonly section, one 2 byte load or 1 byte load, e.g., for 2 loads, t8: i32,ch = load<(dereferenceable load 2 from `i8* getelementptr inbounds (%struct.t, %struct.t* @__const.test.v, i64 0, i32 0)`, align 1), anyext from i16> t3, GlobalAddress:i64<%struct.t* @__const.test.v> 0, undef:i64 t9: ch = store<(store 2 into %ir.v1.sub1), trunc to i16> t3, t8, FrameIndex:i64<0>, undef:i64 BPF backend changed t8 to i64 = Constant<2> and eventually the generated machine IR: t10: i64 = MOV_ri TargetConstant:i64<2> t40: i32 = SLL_ri_32 t10, TargetConstant:i32<8> t41: i32 = OR_ri_32 t40, TargetConstant:i64<0> t9: ch = STH32 t41, TargetFrameIndex:i64<0>, TargetConstant:i64<0>, t3 Note that t10 in the above is not correct. The type should be i32 and instruction should be MOV_ri_32. The reason for incorrect insn selection is BPF insn selection generated an i64 constant instead of an i32 constant as specified in the original load instruction. Such incorrect insn sequence eventually caused the following fatal error when a COPY insn tries to copy a 64bit register to a 32bit subregister. Impossible reg-to-reg copy UNREACHABLE executed at ../lib/Target/BPF/BPFInstrInfo.cpp:42! This patch fixed the issue by using the load result type instead of always i64 when doing readonly load optimization. Differential Revision: https://reviews.llvm.org/D81630 (cherry picked from commit 4db1878158a3f481ff673fef2396c12b7a53d280)

[BPF] fix a bug for BTF pointee type pruning

2020-06-23T21:21:04+00:00

In BTF, pointee type pruning is used to reduce cluttering too many unused types into prog BTF. For example, struct task_struct { ... struct mm_struct *mm; ... } If bpf program does not access members of "struct mm_struct", there is no need to bring types for "struct mm_struct" to BTF. This patch fixed a bug where an incorrect pruning happened. The test case like below: struct t; typedef struct t _t; struct s1 { _t *c; }; int test1(struct s1 *arg) { ... } struct t { int a; int b; }; struct s2 { _t c; } int test2(struct s2 *arg) { ... } After processing test1(), among others, BPF backend generates BTF types for "struct s1", "_t" and a placeholder for "struct t". Note that "struct t" is not really generated. If later a direct access to "struct t" member happened, "struct t" BTF type will be generated properly. During processing test2(), when processing member type "_t c", BPF backend sees type "_t" already generated, so returned. This caused the problem that "struct t" BTF type is never generated and eventually causing incorrect type definition for "struct s2". To fix the issue, during DebugInfo type traversal, even if a typedef/const/volatile/restrict derived type has been recorded in BTF, if it is not a type pruning candidate, type traversal of its base type continues. Differential Revision: https://reviews.llvm.org/D82041 (cherry picked from commit 89648eb16d01725457f958e634d16c534b64c42c)

BPF: fix a CORE optimization bug

2020-05-07T02:53:49+00:00

For the test case in this patch like below struct t { int a; } __attribute__((preserve_access_index)); int foo(void *); int test(struct t *arg) { long param[1]; param[0] = (long)&arg->a; return foo(param); } The IR right before BPF SimplifyPatchable phase: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %2:gpr = LDD killed %1:gpr, 0 %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr STD killed %3:gpr, %stack.0.param, 0 After SimplifyPatchable phase, the incorrect IR is generated: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0" Note that CORE_MEM pseudo op is introduced to encode memory operations related to CORE. In the above, we intend to check whether we have a store like *(%3:gpr + 0) = ... and if this is the case, we could replace it with *(%0:gpr + @"llvm.t:0:0$0:0"_ = ... Unfortunately, in the above, IR for the store is *(%stack.0.param + 0) = %3:gpr and transformation should not happen. Note that we won't have problem if the actual CORE dereference (arg->a) happens. This patch fixed the problem by skip CORE optimization if the use of ADD_rr result is not the base address of the store operation. Differential Revision: https://reviews.llvm.org/D78466 (cherry picked from commit 3cb7e7bf959dcd3b8080986c62e10a75c7af43f0)

[BPF] disable ReduceLoadWidth during SelectionDag phase

2020-02-10T12:45:19+00:00

The compiler may transform the following code ctx = ctx + reloc_offset ... (*(u32 *)ctx) & 0x8000 ... to ctx = ctx + reloc_offset ... (*(u8 *)(ctx + 1)) & 0x80 ... where reloc_offset will be replaced with a constant during AsmPrinter phase. The above transformed code will be rejected the kernel verifier as it does not allow *(type *)((ctx + non_zero_offset1) + non_zero_offset2) style access pattern. It is hard at SelectionDag phase to identify whether a load is related to context or not. Sometime, interprocedure analysis may be needed. So let us simply prevent such optimization from happening. Differential Revision: https://reviews.llvm.org/D73997 (cherry picked from commit d96c1bbaa03574daf759e5e9a6c75047c5e3af64)

[BPF] fix a bug in BPFMISimplifyPatchable pass with -O0

2020-02-03T15:03:26+00:00

The recommended optimization level for BPF programs is O2 since (1). BPF is running inside the kernel and linux kernel won't work at -O0 level, and (2). Verifier is not able to handle O0 code properly, e.g., potential large stack size and a lot of spills. But we should keep -O0 at least compiling. This patch fixed a bug in BPFMISimplifyPatchable phase where with -O0, a segmentation fault will happen for a simple program like: int test(int a, int b) { return a + b; } A test case is added to capture such a case. Differential Revision: https://reviews.llvm.org/D73681 (cherry picked from commit 795bbb366266e83d2bea8dc04c19919b52ab3a2a)

[SelectionDAG] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)

2020-01-13T11:08:12+00:00

As detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value. This patch adds support to SelectionDAG::ComputeKnownBits to use KnownBits::countMinTrailingZeros and countMinLeadingZeros to set the minimum guaranteed leading/trailing known zero bits. Differential Revision: https://reviews.llvm.org/D72573

[BPF] extend BTF_KIND_FUNC to cover global, static and extern funcs

2020-01-10T17:06:31+00:00

Previously extern function is added as BTF_KIND_VAR. This does not work well with existing BTF infrastructure as function expected to use BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO. This patch added extern function to BTF_KIND_FUNC. The two bits 0:1 of btf_type.info are used to indicate what kind of function it is: 0: static 1: global 2: extern Differential Revision: https://reviews.llvm.org/D71638

[BPF] Enable relocation location for load/store/shifts

2019-12-26T17:07:39+00:00

Previous btf field relocation is always at assignment like r1 = 4 which is converted from an ld_imm64 instruction. This patch did an optimization such that relocation instruction might be load/store/shift. Specically, the following insns may also have relocation, except BPF_MOV: LDB, LDH, LDW, LDD, STB, STH, STW, STD, LDB32, LDH32, LDW32, STB32, STH32, STW32, SLL, SRL, SRA To accomplish this, a few BPF target specific codegen only instructions are invented. They are generated at backend BPF SimplifyPatchable phase, which is at early llc phase when SSA form is available. The new codegen only instructions will be converted to real proper instructions at the codegen and BTF emission stage. Note that, as revealed by a few tests, this optimization might be actual generating more relocations: Scenario 1: if (...) { ... __builtin_preserve_field_info(arg->b2, 0) ... } else { ... __builtin_preserve_field_info(arg->b2, 0) ... } Compiler could do CSE to only have one relocation. But if both of the above is translated into codegen internal instructions, the compiler will not be able to do that. Scenario 2: offset = ... __builtin_preserve_field_info(arg->b2, 0) ... ... ... offset ... ... offset ... ... offset ... For whatever reason, the compiler might be temporarily do copy propagation of the righthand of "offset" assignment like ... __builtin_preserve_field_info(arg->b2, 0) ... ... __builtin_preserve_field_info(arg->b2, 0) ... and CSE will be able to deduplicate later. But if these intrinsics are converted to BPF pseudo instructions, they will not be able to get deduplicated. I do not expect we have big instruction count difference. It may actually reduce instruction count since now relocation is in deeper insn dependency chain. For example, for test offset-reloc-fieldinfo-2.ll, this patch generates 7 instead of 6 relocations for non-alu32 mode, but it actually reduced instruction count from 29 to 26. Differential Revision: https://reviews.llvm.org/D71790

Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351

2019-12-25T00:27:51+00:00

Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351

2019-12-24T23:57:33+00:00