bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI	Sanjay Patel	2018-12-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold. llvm-svn: 348550
*	[DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC	Sanjay Patel	2018-12-06	1	-41/+34
\| \| \| \| \| \|	This code can still misbehave. llvm-svn: 348547
*	[DAGCombiner] don't group bswap with casts in logic hoisting fold	Sanjay Patel	2018-12-06	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518 llvm-svn: 348534
*	[DAGCombiner] reduce indent; NFC	Sanjay Patel	2018-12-06	1	-38/+31
\| \| \| \| \| \| \| \|	Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place. llvm-svn: 348523
*	[DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.	Andrea Di Biagio	2018-12-06	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes: concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A) This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case posted by Craig; that particular case would probably require a more complicated approach (and knowledge about used bits). Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx): vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vxorps %xmm1, %xmm1, %xmm1 vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3] vmovaps %ymm0, (%rsi) vzeroupper retq Now we generate this: vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vmovaps %ymm0, (%rsi) vzeroupper retq As a side note: that VZEROUPPER is completely redundant... I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on %-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion %pass is disabled. Differential Revision: https://reviews.llvm.org/D55274 llvm-svn: 348522
*	[DAGCombiner] don't hoist logic op if operands have other uses, part 2	Sanjay Patel	2018-12-06	1	-5/+7
\| \| \| \| \| \| \| \| \|	The PPC test with 2 extra uses seems clearly better by avoiding this transform. With 1 extra use, we also prevent an extra register move (although that might be an RA problem). The general rule should be to only make a change here if it is always profitable. The x86 diffs are all neutral. llvm-svn: 348518
*	Fix Wdocumentation warning. NFCI.	Simon Pilgrim	2018-12-06	1	-2/+0
\| \| \| \|	llvm-svn: 348517
*	[DAGCombiner] don't hoist logic op if operands have other uses	Sanjay Patel	2018-12-06	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AVX512 diffs are neutral, but the bswap test shows a clear overreach in hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can increase the instruction count. This could also fight with transforms trying to go in the opposite direction and possibly blow up/infinite loop. This might be enough to solve the bug noted here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html I did not add the hasOneUse() checks to all opcodes because I see a perf regression for at least one opcode. We may decide that's irrelevant in the face of potential compiler crashing, but I'll see if I can salvage that first. llvm-svn: 348508
*	[DAGCombiner] refactor function that hoists bitwise logic; NFCI	Sanjay Patel	2018-12-06	1	-56/+65
\| \| \| \| \| \| \| \| \|	Added FIXME and TODO comments for lack of safety checks. This function is a suspect in out-of-memory errors as discussed in the follow-up thread to r347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html llvm-svn: 348501
*	DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated ↵	Simon Pilgrim	2018-12-06	1	-3/+4
\| \| \| \| \| \|	VT.getVectorNumElements(). NFCI. llvm-svn: 348494
*	Test commit: Removed trailing space in .txt file.	Markus Lavin	2018-12-06	1	-2/+2
\| \| \| \|	llvm-svn: 348483
*	Add objc.* ARC intrinsics and codegen them to their runtime methods.	Pete Cooper	2018-12-06	1	-0/+36
\| \| \| \| \| \| \| \|	Reviewers: erik.pilkington, ahatanak Differential Revision: https://reviews.llvm.org/D55233 llvm-svn: 348441
*	[MachineOutliner][NFC] Move yet another std::vector out of a loop	Jessica Paquette	2018-12-06	1	-3/+4
\| \| \| \| \| \| \| \|	Once again, following the wisdom of the LLVM Programmer's Manual. I think that's enough refactoring for today. :) llvm-svn: 348439
*	[MachineOutliner][NFC] Move std::vector out of loop	Jessica Paquette	2018-12-06	1	-1/+2
\| \| \| \| \| \|	See http://llvm.org/docs/ProgrammersManual.html#vector llvm-svn: 348433
*	[MachineOutliner][NFC] Remove IntegerInstructionMap from InstructionMapper	Jessica Paquette	2018-12-06	1	-9/+2
\| \| \| \| \| \| \| \| \| \|	Refactoring. This map was only used when we used a string of integers to output the outlined sequence. Since it's no longer used for anything, there's no reason to keep it around. llvm-svn: 348432
*	[GlobalISel] Introduce G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC and ↵	Amara Emerson	2018-12-05	2	-0/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	G_CONCAT_VECTOR opcodes. These opcodes are intended to subsume some of the capability of G_MERGE_VALUES, as it was too powerful and thus complex to add deal with throughout the GISel pipeline. G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar operands which are larger than the destination vector element type, and therefore does an implicit truncate. G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed, vectors together. These will be used in a subsequent commit. This commit just adds the initial infrastructure. Differential Revision: https://reviews.llvm.org/D53594 llvm-svn: 348430
*	[MachineOutliner][NFC] Remove buildCandidateList and replace with findCandidates	Jessica Paquette	2018-12-05	1	-45/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	More refactoring. Since the pruning logic has changed, and the candidate list is gone, everything can be sunk into findCandidates. We no longer need to keep track of the length of the longest substring, so we can drop all of that logic as well. After this, we just find all of the candidates and move to outlining. llvm-svn: 348428
*	[MachineOutliner][NFC] Candidates don't need to be shared_ptrs anymore	Jessica Paquette	2018-12-05	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \|	More refactoring. After the changes to the pruning logic, and removing CandidateList, there's no reason for Candiates to be shared_ptrs (or pointers at all). std::shared_ptr<Candidate> -> Candidate. llvm-svn: 348427
*	[MachineOutliner][NFC] Remove CandidateList, since it's now unused.	Jessica Paquette	2018-12-05	1	-29/+11
\| \| \| \| \| \| \|	After removing the pruning logic, there's no reason to populate a list of Candidates. Remove CandidateList and update comments. llvm-svn: 348422
*	Fix buildbot capture warning	Jessica Paquette	2018-12-05	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	A bot didn't like my lambda. This ought to fix it. Example: http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/30139/steps/build%20lld/logs/stdio error C3493: 'AlreadyRemoved' cannot be implicitly captured because no default capture mode has been specified llvm-svn: 348421
*	[MachineOutliner][NFC] Simplify and unify pruning/outlining logic	Jessica Paquette	2018-12-05	1	-157/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we're now performing outlining per OutlinedFunction rather than per Candidate, we can simply outline each candidate as it shows up. Instead of having a pruning phase, instead, we'll outline entire functions. Then we'll update the UnsignedVec we mapped to reflect the deletion. If any candidate is in a space that's marked dirty, then we'll drop it. This lets us remove the pruning logic entirely, and greatly simplifies the code. llvm-svn: 348420
*	[MachineOutliner] Outline functions by order of benefit	Jessica Paquette	2018-12-05	1	-63/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly NFC, only change is the order of outlined function names. Loop over the outlined functions instead of walking the candidate list. This is a bit easier to understand. It's far more natural to create a function, then replace all of its occurrences with calls than the other way around. The functions outlined after this do not change, but their names will be decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the most beneficial function, rather than the first one seen. This makes it easier to enforce an ordering on the outlined functions. So, this also adds a test to make sure that the ordering works as expected. llvm-svn: 348414
*	[GISel]: Provide standard interface to observe changes in GISel passes	Aditya Nandakumar	2018-12-05	6	-65/+102
\| \| \| \| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/D54980 This provides a standard API across GISel passes to observe and notify passes about changes (insertions/deletions/mutations) to MachineInstrs. This patch also removes the recordInsertion method in MachineIRBuilder and instead provides method to setObserver. Reviewed by: vkeles. llvm-svn: 348406
*	[MachineOutliner][NFC] Don't create outlined sequence from integer mapping	Jessica Paquette	2018-12-05	1	-13/+6
\| \| \| \| \| \| \| \| \| \| \|	Some gardening/refactoring. It's cleaner to copy the instructions into the MachineFunction using the first candidate instead of going to the mapper. Also, by doing this we can remove the Seq member from OutlinedFunction entirely. llvm-svn: 348390
*	[DAGCombiner] don't try to extract a fraction of a vector binop and crash ↵	Sanjay Patel	2018-12-05	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	(PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383
*	[TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from ↵	Simon Pilgrim	2018-12-05	1	-2/+0
\| \| \| \| \| \| \| \|	SimplifyDemandedVectorElts These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases. llvm-svn: 348361
*	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)	Simon Pilgrim	2018-12-05	7	-38/+151
\| \| \| \| \| \| \| \| \| \|	This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353
*	Remove superfluous comments. NFCI.	Simon Pilgrim	2018-12-05	1	-44/+38
\| \| \| \| \| \|	As requested in D54698. llvm-svn: 348350
*	[TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts mask	Simon Pilgrim	2018-12-05	1	-2/+3
\| \| \| \| \| \| \| \|	Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version..... Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches. llvm-svn: 348348
*	[MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers ↵	Craig Topper	2018-12-05	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	in post-RA LICM It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber. This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef. The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before. Differential Revision: https://reviews.llvm.org/D55102 llvm-svn: 348330
*	[SelectionDAG] Split very large token factors for loads into 64k chunks.	Amara Emerson	2018-12-05	2	-3/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a 64k limit on the number of SDNode operands, and some very large functions with 64k or more loads can cause crashes due to this limit being hit when a TokenFactor with this many operands is created. To fix this, create sub-tokenfactors if we've exceeded the limit. No test case as it requires a very large function. rdar://45196621 Differential Revision: https://reviews.llvm.org/D55073 llvm-svn: 348324
*	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI.	Nirav Dave	2018-12-04	1	-1/+4
\| \| \| \|	llvm-svn: 348288
*	MIR: Add method to stop after specific runs of passes	Matt Arsenault	2018-12-04	1	-8/+38
\| \| \| \| \| \| \| \| \| \|	Currently if you use -{start,stop}-{before,after}, it picks the first instance with the matching pass name. If you run the same pass multiple times, there's no way to distinguish them. Allow specifying a run index wih ,N to specify which you mean. llvm-svn: 348285
*	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion ↵	Simon Pilgrim	2018-12-04	1	-11/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251
*	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes	Simon Pilgrim	2018-12-04	2	-0/+23
\| \| \| \| \| \| \| \|	Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246
*	[DAGCombiner] narrow truncated vector binops when legal	Sanjay Patel	2018-12-03	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195
*	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb ↵	Craig Topper	2018-12-03	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158
*	[SelectionDAG] fold constant with undef vector per element	Sanjay Patel	2018-12-02	1	-10/+16
\| \| \| \| \| \| \| \| \| \| \| \|	This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090
*	[DAGCombiner] guard against an oversized shift crash	Sanjay Patel	2018-12-02	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089
*	[SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts ↵	Simon Pilgrim	2018-12-01	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \|	simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073
*	AMDGPU: Fix various issues around the VirtReg2Value mapping	Nicolai Haehnle	2018-11-30	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049
*	[SelectionDAG] fold FP binops with 2 undef operands to undef	Sanjay Patel	2018-11-30	1	-2/+4
\| \| \| \|	llvm-svn: 348016
*	Revert "[BTF] Add BTF DebugInfo"	Yonghong Song	2018-11-30	7	-1114/+2
\| \| \| \| \| \|	This reverts commit 9c6b970db8bc63b28ce58a129bb1580a6a3c6caf. llvm-svn: 348004
*	[BTF] Add BTF DebugInfo	Yonghong Song	2018-11-30	7	-2/+1114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds BPF Debug Format (BTF) as a standalone LLVM debuginfo. The BTF related sections are directly generated from IR. The BTF debuginfo is generated only when the compilation target is BPF. What is BTF? ============ First, the BPF is a linux kernel virtual machine and widely used for tracing, networking and security. https://www.kernel.org/doc/Documentation/networking/filter.txt https://cilium.readthedocs.io/en/v1.2/bpf/ BTF is the debug info format for BPF, introduced in the below linux patch https://github.com/torvalds/linux/commit/69b693f0aefa0ed521e8bd02260523b5ae446ad7#diff-06fb1c8825f653d7e539058b72c83332 in the patch set mentioned in the below lwn article. https://lwn.net/Articles/752047/ The BTF format is specified in the above github commit. In summary, its layout looks like struct btf_header type subsection (a list of types) string subsection (a list of strings) With such information, the kernel and the user space is able to pretty print a particular bpf map key/value. One possible example below: Withtout BTF: key: [ 0x01, 0x01, 0x00, 0x00 ] With BTF: key: struct t { a : 1; b : 1; c : 0} where struct is defined as struct t { char a; char b; short c; }; How BTF is generated? ===================== Currently, the BTF is generated through pahole. https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69 and available in pahole v1.12 https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4 Basically, the bpf program needs to be compiled with -g with dwarf sections generated. The pahole is enhanced such that a .BTF section can be generated based on dwarf. This format of the .BTF section matches the format expected by the kernel, so a bpf loader can just take the .BTF section and load it into the kernel. https://github.com/torvalds/linux/commit/8a138aed4a807ceb143882fb23a423d524dcdb35 The .BTF section layout is also specified in this patch: with file include/llvm/BinaryFormat/BTF.h. What use cases this patch tries to address? =========================================== Currently, only the bpf instruction stream is required to pass to the kernel. The kernel verifies it, jits it if configured to do so, attaches it to a particular kernel attachment point, and later executes when a particular event happens. This patch tries to expand BTF to support two more use cases below: (1). BPF supports subroutine calls. During performance analysis, it would be good to differentiate which call is hot instead of just providing a virtual address. This would require to pass a unique identifier for each subroutine to the kernel, the subroutine name is a natual choice. (2). If a particular jitted instruction is hot, we want user to know which source line this jitted instruction belongs to. This would require the source information is available to various profiling tools. Note that in a single ELF file, . there may be multiple loadable bpf programs, . for a particular to-be-loaded bpf instruction stream, its instructions may come from multiple PROGBITS sections, the bpf loader needs to merge them together to a single consecutive insn stream before loading to the kernel. For example: section .text: subroutines funcFoo section _progA: calling funcFoo section _progB: calling funcFoo The bpf loader could construct two loadable bpf instruction streams and load them into the kernel: . _progA funcFoo . _progB funcFoo So per ELF section function offset and instruction offset will need to be adjusted before passing to the kernel, and the kernel essentially expect only one code section regardless of how many in the ELF file. What do we propose and Why? =========================== To support the above two use cases, we propose to add an additional section, .BTF.ext, to the ELF file which is the input of the bpf loader. A different section is preferred since loader may need to manipulate it before loading part of its data to the kernel. The .BTF.ext section has a similar header to the .BTF section and it contains two subsections for func_info and line_info. . the func_info maps the func insn byte offset to a func type in the .BTF type subsection. . the line_info maps the insn byte offset to a line info. . both func_info and line_info subsections are organized by ELF PROGBITS AX sections. pahole is not a good place to implement .BTF.ext as pahole is mostly for structure hole information and more importantly, we want to pass the actual code to the kernel. . bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical as you may not even know who loads this bpf program. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used. LLVM is a good place to implement. . The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. . Another consideration is for JIT. The project like bcc (https://github.com/iovisor/bcc) use MCJIT to compile a C program into bpf insns and load them to the kernel. The llvm generated BTF sections will be readily available for such cases as well. Design and implementation of emiting .BTF/.BTF.ext sections =========================================================== The BTF debuginfo format is defined. Both .BTF and .BTF.ext sections are generated directly from IR when both "-target bpf" and "-g" are specified. Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target. This patch also contains tests to verify generated .BTF and .BTF.ext sections for all supported types, func_info and line_info subsections. The patch is also tested against linux kernel bpf sample tests and selftests. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D53736 llvm-svn: 347999
*	[CodeGen] Prefer static frame index for STATEPOINT liveness args	Than McIntosh	2018-11-30	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If a given liveness arg of STATEPOINT is at a fixed frame index (e.g. a function argument passed on stack), prefer to use this fixed location even the address is also in a register. If we use the register it will generate a spill, which is not necessary since the fixed frame index can be directly recorded in the stack map. Patch by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, niravd, reames Reviewed By: reames Subscribers: cherryyz, reames, anna, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53889 llvm-svn: 347998
*	TableGen/ISel: Allow PatFrag predicate code to access captured operands	Nicolai Haehnle	2018-11-30	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This simplifies writing predicates for pattern fragments that are automatically re-associated or commuted. For example, a followup patch adds patterns for fragments of the form (add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are automatically commuted to (add $z, (shl $x, $y)), which makes it basically impossible to refer to $x, $y, and $z generically in the PredicateCode. With this change, the PredicateCode can refer to $x, $y, and $z simply as `Operands[i]`. Test confirmed that there are no changes to any of the generated files when building all (non-experimental) targets. Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D51994 llvm-svn: 347992
*	[SelectionDAG] Support result type promotion for FLT_ROUNDS_	Alex Bradbury	2018-11-30	2	-0/+10
\| \| \| \| \| \| \| \| \|	For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_. Differential Revision: https://reviews.llvm.org/D53820 llvm-svn: 347986
*	[SelectionDAG] Support promotion of PREFETCH operands	Alex Bradbury	2018-11-30	2	-0/+15
\| \| \| \| \| \| \| \| \|	For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operands of ISD::PREFETCH. Differential Revision: https://reviews.llvm.org/D53281 llvm-svn: 347980
*	[SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands	Alex Bradbury	2018-11-30	2	-0/+10
\| \| \| \| \| \| \| \| \|	For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operand. Differential Revision: https://reviews.llvm.org/D53279 llvm-svn: 347978
*	[TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement ↵	Alex Bradbury	2018-11-30	2	-10/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977