summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/BPF
Commit message (Collapse)AuthorAgeFilesLines
...
* bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in orderYonghong Song2018-07-259-8/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some BPF JIT backends would want to optimize memcpy in their own architecture specific way. However, at the moment, there is no way for JIT backends to see memcpy semantics in a reliable way. This is due to LLVM BPF backend is expanding memcpy into load/store sequences and could possibly schedule them apart from each other further. So, BPF JIT backends inside kernel can't reliably recognize memcpy semantics by peephole BPF sequence. This patch introduce new intrinsic expand infrastructure to memcpy. To get stable in-order load/store sequence from memcpy, we first lower memcpy into BPF::MEMCPY node which then expanded into in-order load/store sequences in expandPostRAPseudo pass which will happen after instruction scheduling. By this way, kernel JIT backends could reliably recognize memcpy through scanning BPF sequence. This new memcpy expand infrastructure is gated by a new option: -bpf-expand-memcpy-in-order Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 337977
* [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixupPeter Smith2018-06-061-3/+8
| | | | | | | | | | | | | | | | | | On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078
* Set ADDE/ADDC/SUBE/SUBC to expand by defaultAmaury Sechet2018-06-011-4/+0
| | | | | | | | | | | | | | | Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 llvm-svn: 333748
* MC: Separate creating a generic object writer from creating a target object ↵Peter Collingbourne2018-05-213-14/+10
| | | | | | | | | | | | | writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868
* MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an ↵Peter Collingbourne2018-05-211-25/+15
| | | | | | | | | | | | | MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857
* Support: Simplify endian stream interface. NFCI.Peter Collingbourne2018-05-181-27/+18
| | | | | | | | | | | | Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757
* MC: Change the streamer ctors to take an object writer instead of a stream. ↵Peter Collingbourne2018-05-181-2/+2
| | | | | | | | | | | | | | NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-142-29/+31
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* [DebugInfo] Examine all uses of isDebugValue() for debug instructions.Shiva Chen2018-05-091-2/+2
| | | | | | | | | | | | | | | | | | Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844
* Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txtNico Weber2018-04-231-1/+1
| | | | llvm-svn: 330584
* bpf: signal error instead of silent drop for certain invalid asm insnYonghong Song2018-04-111-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, an invalid asm insn, either in an asm file or in an inline asm format, might be silently dropped. This patch fixed two places where this may happen by signaling the error so user knows what goes wrong. The following is an example to demonstrate error messages: -bash-4.2$ cat t.c int test(void *ctx) { #if defined(NO_ERROR) asm volatile("r0 = *(u16 *)skb[%0]" : : "i"(2)); #elif defined(ERROR_1) asm volatile("r20 = *(u16 *)skb[%0]" : : "i"(2)); #elif defined(ERROR_2) asm volatile("r0 = *(u16 *)(r1 + ?)" : :); #endif return 0; } -bash-4.2$ cat run.sh for macro in NO_ERROR ERROR_1 ERROR_2; do echo "===== compile for macro" $macro clang -D${macro} -O2 -target bpf -emit-llvm -S t.c echo "==llc==" llc -march=bpf -filetype=obj t.ll done -bash-4.2$ ./run.sh ===== compile for macro NO_ERROR ==llc== ===== compile for macro ERROR_1 ==llc== <inline asm>:1:2: error: invalid register/token name r20 = *(u16 *)skb[2] ^ note: !srcloc = 135 ===== compile for macro ERROR_2 ==llc== <inline asm>:1:21: error: unexpected token r0 = *(u16 *)(r1 + ?) ^ note: !srcloc = 210 -bash-4.2$ Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 329849
* Sort targetgen calls in lib/Target/*/CMakeLists.Nico Weber2018-04-041-5/+6
| | | | | | | | | | | Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181
* bpf: fix incorrect SELECT_CC loweringYonghong Song2018-04-031-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043
* Fix a bunch of typoes. NFCFangrui Song2018-03-301-1/+1
| | | | llvm-svn: 328907
* [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to ↵Craig Topper2018-03-291-1/+1
| | | | | | | | | | | | CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806
* Remove some unused includes to fix layering.David Blaikie2018-03-291-1/+0
| | | | llvm-svn: 328745
* Fix layering by moving ValueTypes.h from CodeGen to IRDavid Blaikie2018-03-231-1/+1
| | | | | | ValueTypes.h is implemented in IR already. llvm-svn: 328397
* bpf: Enhance debug information for peephole optimization passesYonghong Song2018-03-131-1/+19
| | | | | | | | | | | Add more debug information for peephole optimization passes. These would only be enabled for debug version binary and could help analyzing why some optimization opportunities were missed. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327371
* bpf: New post-RA peephole optimization pass to eliminate bad RA codegenYonghong Song2018-03-133-8/+111
| | | | | | | | | | | | | | | | | This new pass eliminate identical move: MOV rA, rA This is particularly possible to happen when sub-register support enabled. The special type cast insn MOV_32_64 involves different register class on src (i32) and dst (i64), RA could generate useless instruction due to this. This pass also could serve as the bast for further post-RA optimization. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327370
* bpf: Don't expand BSWAP on i32, promote itYonghong Song2018-03-131-1/+1
| | | | | | | | | | | | | | | Currently, there is no ALU32 bswap support in eBPF ISA. BSWAP on i32 was set to EXPAND which would need about eight instructions for single BSWAP. It would be more efficient to promote it to i64, then doing BSWAP on i64. For eBPF programs, most of the promotion are zero extensions which are likely be elimiated later by peephole optimizations. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327369
* bpf: Support subregister definition check on PHI nodeYonghong Song2018-03-131-2/+16
| | | | | | | | | | | | | This patch relax the subregister definition check on Phi node. Previously, we just cancel the optimizatoin when the definition is Phi node while actually we could further check the definitions of incoming parameters of PHI node. This helps catch more elimination opportunities. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327368
* bpf: Extends zero extension elimination beyond comparison instructionsYonghong Song2018-03-131-93/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | The current zero extension elimination was restricted to operands of comparison. It actually could be extended to more cases. For example: int *inc_p (int *p, unsigned a) { return p + a; } 'a' will be promoted to i64 during addition, and the zero extension could be eliminated as well. For the elimination optimization, it should be much better to start recognizing the candidate sequence from the SRL instruction instead of J* instructions. This patch makes it an generic zero extension elimination pass instead of one restricted with comparison. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327367
* bpf: J*_RR should check both operandsYonghong Song2018-03-131-6/+4
| | | | | | | | | | | | | There is a mistake in current code that we "break" out the optimization when the first operand of J*_RR doesn't qualify the elimination. This caused some elimination opportunities missed, for example the one in the testcase. The code should just fall through to handle the second operand. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327366
* bpf: Tighten subregister definition checkYonghong Song2018-03-131-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current subregister definition check stops after the MOV_32_64 instruction. This means we are thinking all the following instruction sequences are safe to be eliminated: MOV_32_64 rB, wA SLL_ri rB, rB, 32 SRL_ri rB, rB, 32 However, this is *not* true. The source subregister wA of MOV_32_64 could come from a implicit truncation of 64-bit register in which case the high bits of the 64-bit register is not zeroed, therefore we can't eliminate above sequence. For example, for i32_val, we shouldn't do the elimination: long long bar (); int foo (int b, int c) { unsigned int i32_val = (unsigned int) bar(); if (i32_val < 10) return b; else return c; } Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327365
* bpf: introduce -mattr=dwarfris to disable DwarfUsesRelocationsAcrossSectionsYonghong Song2018-03-015-2/+16
| | | | | | | | | | | | | | | | | | | | | Commit e4507fb8c94b ("bpf: disable DwarfUsesRelocationsAcrossSections") disables MCAsmInfo DwarfUsesRelocationsAcrossSections unconditionally so that dwarf will not use cross section (between dwarf and symbol table) relocations. This new debug format enables pahole to dump structures correctly as libdwarves.so does not have BPF backend support yet. This new debug format, however, breaks bcc (https://github.com/iovisor/bcc) source debug output as llvm in-memory Dwarf support has some issues to handle it. More specifically, with DwarfUsesRelocationsAcrossSections disabled, JIT compiler does not generate .debug_abbrev and Dwarf DIE (debug info entry) processing is not happy about this. This patch introduces a new flag -mattr=dwarfris (dwarf relocation in section) to disable DwarfUsesRelocationsAcrossSections. DwarfUsesRelocationsAcrossSections is true by default. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 326505
* bpf: New optimization pass for eliminating unnecessary i32 promotionsYonghong Song2018-02-234-0/+197
| | | | | | | | | | | | | | | | | | This pass performs peephole optimizations to cleanup ugly code sequences at MachineInstruction layer. Currently, the only optimization in this pass is to eliminate type promotion sequences for zero extending 32-bit subregisters to 64-bit registers. If the compiler could prove the zero extended source come from 32-bit subregistere then it is safe to erase those promotion sequece, because the upper half of the underlying 64-bit registers were zeroed implicitly already. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325991
* bpf: New decoder namespace for 32-bit subregister load/storeYonghong Song2018-02-231-2/+43
| | | | | | | | | | | | | | | | | | | | | | | When -mattr=+alu32 passed to the disassembler, use decoder namespace for 32-bit subregister. This is to disassemble load and store instructions in preferred B format as described in previous commit: w = *(u8 *) (r + off) // BPF_LDX | BPF_B w = *(u16 *)(r + off) // BPF_LDX | BPF_H w = *(u32 *)(r + off) // BPF_LDX | BPF_W *(u8 *) (r + off) = w // BPF_STX | BPF_B *(u16 *)(r + off) = w // BPF_STX | BPF_H *(u32 *)(r + off) = w // BPF_STX | BPF_W NOTE: all other instructions should still use the default decoder namespace. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325990
* bpf: Enable 32-bit subregister support for -mattr=+alu32Yonghong Song2018-02-231-0/+2
| | | | | | | | | After all those preparation patches, now we could enable 32-bit subregister support once -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325989
* bpf: Support 32-bit subregister in various InstrInfo hooksYonghong Song2018-02-231-0/+10
| | | | | | | | | This patch support 32-bit subregister in three InstrInfo hooks, i.e. copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot, Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325988
* bpf: New instruction patterns for 32-bit subregister load and storeYonghong Song2018-02-232-10/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The instruction mapping between eBPF/arm64/x86_64 are: eBPF arm64 x86_64 LD1 BPF_LDX | BPF_B ldrb movzbl LD2 BPF_LDX | BPF_H ldrh movzwl LD4 BPF_LDX | BPF_W ldr movl movzbl/movzwl/movl on x86_64 accept 32-bit sub-register, for example %eax, the same for ldrb/ldrh on arm64 which accept 32-bit "w" register. And actually these instructions only accept sub-registers. There is no point to have LD1/2/4 (unsigned) for 64-bit register, because on these arches, upper 32-bits are guaranteed to be zeroed by hardware or VM, so load into the smallest available register class is the best choice for maintaining type information. For eBPF we should adopt the same philosophy, to change current format (A): r = *(u8 *) (r + off) // BPF_LDX | BPF_B r = *(u16 *)(r + off) // BPF_LDX | BPF_H r = *(u32 *)(r + off) // BPF_LDX | BPF_W *(u8 *) (r + off) = r // BPF_STX | BPF_B *(u16 *)(r + off) = r // BPF_STX | BPF_H *(u32 *)(r + off) = r // BPF_STX | BPF_W into B: w = *(u8 *) (r + off) // BPF_LDX | BPF_B w = *(u16 *)(r + off) // BPF_LDX | BPF_H w = *(u32 *)(r + off) // BPF_LDX | BPF_W *(u8 *) (r + off) = w // BPF_STX | BPF_B *(u16 *)(r + off) = w // BPF_STX | BPF_H *(u32 *)(r + off) = w // BPF_STX | BPF_W There is no change on encoding nor how should they be interpreted, everything is as it is, load the specified length, write into low bits of the register then zeroing all remaining high bits. The only change is their associated register class and how compiler view them. Format A still need to be kept, because eBPF LLVM backend doesn't support sub-registers at default, but once 32-bit subregister is enabled, it should use format B. This patch implemented this together with all those necessary extended load and truncated store patterns. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325987
* bpf: Support i32 in getScalarShiftAmountTy methodYonghong Song2018-02-232-0/+7
| | | | | | | | | | getScalarShiftAmount method should be implemented for eBPF backend to make sure shift amount could still get correct type once 32-bit subregisters support are enabled. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325986
* bpf: Support condition comparison on i32Yonghong Song2018-02-233-27/+126
| | | | | | | | | | | | | | | | | | | We need to support condition comparison on i32. All these comparisons are supposed to be combined into BPF_J* instructions which only support i64. For ISD::BR_CC we need to promote it to i64 first, then do custom lowering. For ISD::SET_CC, just expand to SELECT_CC like what's been done for i64. For ISD::SELECT_CC, we also want to do custom lower for i32. However, after 32-bit subregister support enabled, it is possible the comparison operands are i32 while the selected value are i64, or the comparison operands are i64 while the selected value are i32. We need to define extra instruction pattern and support them in custom instruction inserter. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325985
* bpf: Handle i32 for ALU operations without ISA supportYonghong Song2018-02-231-21/+26
| | | | | | | | | | | | There is no eBPF ISA support for BSWAP, ROTR, ROTL, SREM, SDIVREM, MULHU, ADDC, ADDE etc on i32. They could be emulated by other basic BPF_ALU operations, we'd set their lowering action the same as i64. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325984
* bpf: New calling convention for 32-bit subregistersYonghong Song2018-02-233-9/+37
| | | | | | | | | | | This patch add new calling conventions to allow GPR32RegClass as valid register class for arguments and return types. New calling convention will only be choosen when -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325983
* bpf: New target attribute "alu32" for 32-bit subregister supportYonghong Song2018-02-233-0/+9
| | | | | | | | | | | | | | | | | | This new attribute aims to control the enablement of 32-bit subregister support on eBPF backend. Name the interface as "alu32" is because we in particular want to enable the generation of BPF_ALU32 instructions by enable subregister support. This attribute could be used in the following format with llc: llc -mtriple=bpf -mattr=[+|-]alu32 It is disabled at default. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325982
* bpf: Define instruction patterns for extensions and truncations between i32 ↵Yonghong Song2018-02-231-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | to i64 For transformations between i32 and i64, if it is explicit signed extension: - first cast the operand to i64 - then use SLL + SRA to finish the extension. if it is explicit zero extension: - first cast the operand to i64 - then use SLL + SRL to finish the extension. if it is explicit any extension: - just refer to 64-bit register. if it is explicit truncation: - just refer to 32-bit subregister. NOTE: Some of the zero extension sequences might be unnecessary, they will be removed by an peephole pass on MachineInstruction layer. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325981
* bpf: Tighten the immediate predication for 32-bit alu instructionsYonghong Song2018-02-231-2/+4
| | | | | | | | | | | | | | | | | These 32-bit ALU insn patterns which takes immediate as one operand were initially added to enable AsmParser support, and the AsmMatcher uses "ins" and "outs" fields to deduct the operand constraint. However, the instruction selector doesn't work the same as AsmMatcher. The selector will use the "pattern" field for which we are not setting the predication for immediate operands correctly. Without this patch, i32 would eventually means all i32 operands are valid, both imm and gpr, while these patterns should allow imm only. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325980
* bpf: Use markSuperRegs to mark reserved registersYonghong Song2018-02-231-2/+2
| | | | | | | | | markSuperRegs is the canonical helper function used to mark reserved registers. It could mark any overlapping sub-registers automatically. Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 325979
* bpf: disable DwarfUsesRelocationsAcrossSectionsYonghong Song2018-02-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pahole does not work with BPF backend properly: -bash-4.2$ cat test.c struct test_t { int a; int b; }; int test(struct test_t *s) { return s->a; } -bash-4.2$ clang -g -O2 -target bpf -c test.c -bash-4.2$ pahole test.o struct clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) { clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /* 0 4 */ clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /* 4 4 */ /* size: 8, cachelines: 1, members: 2 */ /* last cacheline: 8 bytes */ }; -bash-4.2$ The reason is that BPF backend is not yet implemented in elfutils backend https://github.com/threatstack/elfutils/tree/master/backends and pahole depends on elfutils for dwarf parsing and resolving relocation. More specifically, the unsupported relocation in .debug_info for type/member name against symbol table caused the incorrect result above. The following is the raw .rel.debug_info for the above example, Hex dump of section '.rel.debug_info': 0x00000000 06000000 00000000 0a000000 0b000000 ................ 0x00000010 0c000000 00000000 0a000000 01000000 ................ 0x00000020 12000000 00000000 0a000000 02000000 ................ 0x00000030 16000000 00000000 0a000000 0e000000 ................ 0x00000040 1a000000 00000000 0a000000 03000000 ................ ----------------- -------- -------- reloc location type symtab index Hex dump of section '.debug_info': 0x00000000 7b000000 04000000 00000801 00000000 {............... 0x00000010 0c000000 00000000 00000000 00000000 ................ 0x00000020 00000000 00001000 00000200 00000000 ................ Based on "type", the proper value will be extracted from symbol table and filled in .debug_info so later on .debug_info can be properly resolved against debug strings. There are two ways to fix this problem. One is to fix elfutils by adding BPF support which is desirable. This could take a long time and won't work with already deployed pahole. For a short term workaround, we can disable dwarf cross-section relation which specifically avoids debug_info and symbol table cross relocation. This should help any dwarf-related tool which has not implement BPF specific relocations yet. Now .rel.debug_info does not have any relocation for symbol table and .debug_info itself contains necessary relocation information by itself. Hex dump of section '.debug_info': 0x00000000 7b000000 04000000 00000801 00000000 {............... 0x00000010 0c003700 00000000 00003e00 00000000 ..7.......>..... 0x00000020 00000000 00001000 00000200 00000000 ................ location 0xc has 0, 0x12 has 0x37, 0x1a has 0x3e in place which will be used in relocation resolution. Here, the values of 0, 0x37 and 0x3e are offset in .debug_str section. Please note the difference between two above .debug_info dumps. With the fix, pahole works properly with BPF backend: -bash-4.2$ clang -O2 -g -target bpf -c test.c -bash-4.2$ pahole test.o struct test_t { int a; /* 0 4 */ int b; /* 4 4 */ /* size: 8, cachelines: 1, members: 2 */ /* last cacheline: 8 bytes */ }; Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 325735
* [BPF] Return true in enableMultipleCopyHints().Jonas Paulsson2018-02-181-0/+2
| | | | | | | | | | Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Yonghong Song llvm-svn: 325457
* bpf: fix a bug in dag2dag optimization for loads from readonly sectionYonghong Song2018-02-151-4/+4
| | | | | | | | | | | | | | | | | | | | | The reference '&' is missing in the function parameter. If there are back-to-back optimizations in terms of dag node list like below: t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)+12](dereferenceable), zext from i32> t3, t43, undef:i64 t34: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64 The bug will trigger a segfault for the added test case remove_truncate_5.ll: LLVMSymbolizer: error reading file: No such file or directory #0 0x000000000241c4d9 (llc+0x241c4d9) #1 0x000000000241c56a (llc+0x241c56a) #2 0x000000000241aa50 (llc+0x241aa50) ... #22 0x0000000000fd5edf (llc+0xfd5edf) #23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05) #24 0x0000000000fd3e69 (llc+0xfd3e69) ... Segmentation fault Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 325267
* bpf: Improve expanding logic in LowerSELECT_CCYonghong Song2018-02-081-0/+5
| | | | | | | | | | | | | | | LowerSELECT_CC is not generating optimal Select_Ri pattern at the moment. It is not guaranteed to place ConstantNode at RHS which would miss matching Select_Ri. A new testcase added into the existing select_ri.ll, also there is an existing case in cmp.ll which would be improved to use Select_Ri after this patch, it is adjusted accordingly. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 324560
* [SelectionDAGISel] Add a debug print before call to Select. Adjust where ↵Craig Topper2018-01-261-3/+0
| | | | | | | | | | | | blank lines are printed during isel process to make things more sensibly grouped. Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table. It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search. There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line. llvm-svn: 323551
* [BPF] Mark pseudo insn patterns as isCodeGenOnlyYonghong Song2018-01-161-2/+2
| | | | | | | | | | | | | | | | | | | | These pseudos are not supposed to be visible to user. This patch reduced the auto-generated instruction matcher. For example, the following words are removed from keyword list of LLVM BPF assembler. - MCK__35_, // '#' - MCK__COLON_, // ':' - MCK__63_, // '?' - MCK_ADJCALLSTACKDOWN, // 'ADJCALLSTACKDOWN' - MCK_ADJCALLSTACKUP, // 'ADJCALLSTACKUP' - MCK_PSEUDO, // 'PSEUDO' - MCK_Select, // 'Select' Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 322535
* [BPF] Teach DAG2DAG AND elimination about load intrinsicsYonghong Song2018-01-161-7/+31
| | | | | | | | | | | | | | | | | | As commented on the existing code: // The Reg operand should be a virtual register, which is defined // outside the current basic block. DAG combiner has done a pretty // good job in removing truncating inside a single basic block. However, when the Reg operand comes from bpf_load_[byte | half | word] intrinsics, the generic optimizer doesn't understand their results are zero extended, so these single basic block elimination opportunities were missed. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 322534
* Thread MCSubtargetInfo through Target::createMCAsmBackendAlex Bradbury2018-01-032-8/+8
| | | | | | | | | | | | | | | | | | | | | Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. D20830 threaded an MCSubtargetInfo reference through MCAsmBackend::relaxInstruction, but this isn't the only function that would benefit from access. This patch removes the Triple and CPUString arguments from createMCAsmBackend and replaces them with MCSubtargetInfo. This patch just changes the interface without making any intentional functional changes. Once in, several cleanups are possible: * Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend * Support 16-bit instructions when valid in MipsAsmBackend::writeNopData * Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl * Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221) This change initially exposed PR35686, which has since been resolved in r321026. Differential Revision: https://reviews.llvm.org/D41349 llvm-svn: 321692
* bpf: add support for objdump -print-imm-hexYonghong Song2017-12-201-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for 'objdump -print-imm-hex' for imm64, operand imm and branch target. If user programs encode immediate values as hex numbers, such an option will make it easy to correlate asm insns with source code. This option also makes it easy to correlate imm values with insn encoding. There is one changed behavior in this patch. In old way, we print the 64bit imm as u64: O << (uint64_t)Op.getImm(); and the new way is: O << formatImm(Op.getImm()); The formatImm is defined in llvm/MC/MCInstPrinter.h as format_object<int64_t> formatImm(int64_t Value) So the new way to print 64bit imm is i64 type. If a 64bit value has the highest bit set, the old way will print the value as a positive value and the new way will print as a negative value. The new way is consistent with x86_64. For the code (see the test program): ... if (a == 0xABCDABCDabcdabcdULL) ... x86_64 objdump, with and without -print-imm-hex, looks like: 48 b8 cd ab cd ab cd ab cd ab movabsq $-6067004223159161907, %rax 48 b8 cd ab cd ab cd ab cd ab movabsq $-0x5432543254325433, %rax Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 321215
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-152-7/+7
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.Francis Visoiu Mistrih2017-12-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-4/+4
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
OpenPOWER on IntegriCloud