bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add option to specify minimum number of entries for jump tables	Evandro Menezes	2016-10-25	1	-3/+14
\| \| \| \| \| \| \| \| \| \|	Add an option to allow easier experimentation by target maintainers with the minimum number of entries to create jump tables. Also clarify the name of the other existing option governing the creation of jump tables. Differential revision: https://reviews.llvm.org/D25883 llvm-svn: 285104
*	Switch lowering: improve partitioning of jump tables	Evandro Menezes	2016-10-25	1	-14/+31
\| \| \| \| \| \| \| \| \| \| \|	When there's a tie between partitionings of jump tables, consider also cases that result in no jump tables, but in one or a few cases. The motivation is that many contemporary processors typically perform case switches fairly quickly. Differential revision: https://reviews.llvm.org/D25212 llvm-svn: 285099
*	Remove debug location from common tail when tail-merging	Robert Lougher	2016-10-25	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The branch folding pass tail merges blocks into a common-tail. However, the tail retains the debug information from one of the original inputs to the merge (chosen randomly). This is a problem for sampled-based PGO, as hits on the common-tail will be attributed to whichever block was chosen, irrespective of which path was actually taken to the common-tail. This patch fixes the issue by nulling the debug location for the common-tail. Differential Revision: https://reviews.llvm.org/D25742 llvm-svn: 285093
*	[DAGCombine] Preserve shuffles when one of the vector operands is constant	Zvi Rackover	2016-10-25	1	-34/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Do not perform combines such as: vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X)) -> build_vector(X, C0, C1, C2) Keeping the shuffle allows lowering the constant build_vector to a materialized constant vector (such as a vector-load from the constant-pool or some other idiom). Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25524 llvm-svn: 285063
*	MachineInstrBundle: Pass iterators to getBundle(Start\|End); NFC	Matthias Braun	2016-10-25	1	-2/+2
\| \| \| \| \| \| \| \|	This is a function to go backwards in a block to find the first instruction in a bundle, so iterator is a more natural choice for parameter/return rather than a reference to a MachineInstruction. llvm-svn: 285051
*	CodeGen/Passes: Pass MachineFunction as functor arg; NFC	Matthias Braun	2016-10-24	2	-9/+10
\| \| \| \| \| \| \| \|	Passing a MachineFunction as argument is more natural and avoids an unnecessary round-trip through the logic determining the correct Subtarget because MachineFunction already has a reference anyway. llvm-svn: 285039
*	[SelectionDAG] Update ComputeNumSignBits SRA/SHL handlers to accept scalar ↵	Simon Pilgrim	2016-10-24	1	-6/+7
\| \| \| \| \| \| \| \| \| \|	or vector splats Use isConstOrConstSplat helper. Also use APInt instead of getZExtValue directly to avoid out of range issues. llvm-svn: 285033
*	Use MachineInstr::mop_iterator instead of MIOperands; NFC	Matthias Braun	2016-10-24	1	-6/+6
\| \| \| \| \| \| \| \|	(Const)?MIOperands is equivalent to the C++ style MachineInstr::mop_iterator. Use the latter for consistency except for a few callers of MIOperands::analyzePhysReg(). llvm-svn: 285029
*	Use SDValue::getConstantOperandVal() helper. NFCI.	Simon Pilgrim	2016-10-24	1	-4/+2
\| \| \| \|	llvm-svn: 285025
*	Target: Change various section classifiers in TargetLoweringObjectFile to ↵	Peter Collingbourne	2016-10-24	1	-45/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	take a GlobalObject. These functions are about classifying a global which will actually be emitted, so it does not make sense for them to take a GlobalValue which may for example be an alias. Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to look through aliases before using TargetLoweringObjectFile interfaces. These are functional changes but all appear to be bug fixes. Differential Revision: https://reviews.llvm.org/D25917 llvm-svn: 285006
*	CodeGen: Do not add a global's address space to the folding set profile.	Peter Collingbourne	2016-10-24	1	-2/+0
\| \| \| \| \| \| \|	It is already part of the type (which is part of the global, which is already being added), so there's no need to do it. llvm-svn: 285002
*	[DAG] enhance computeKnownBits to handle SRL/SRA with vector splat constant	Sanjay Patel	2016-10-23	1	-43/+32
\| \| \| \|	llvm-svn: 284953
*	Use SDValue::getConstantOperandVal() helper. NFCI.	Simon Pilgrim	2016-10-23	1	-4/+1
\| \| \| \|	llvm-svn: 284949
*	Remove LLVM_CONSTEXPR.	Justin Lebar	2016-10-23	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: With MSVC 2013 and GCC < 4.8 gone, we can use the "constexpr" keyword. Reviewers: bkramer, mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25901 llvm-svn: 284947
*	[DAG] enhance computeKnownBits to handle SHL with vector splat constant	Sanjay Patel	2016-10-21	1	-10/+9
\| \| \| \| \| \|	Also, use APInt to avoid crashing on types larger than vNi64. llvm-svn: 284874
*	[DAG] fold negation of sign-bit	Sanjay Patel	2016-10-21	1	-11/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	0 - X --> 0, if the sub is NUW 0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW 0 - X --> X, if X is 0 or the minimum signed value This is the DAG equivalent of: https://reviews.llvm.org/rL284649 plus the fold for the NUW case which already existed in InstSimplify. Note that we miss a vector fold because of a deficiency in the DAG version of computeKnownBits(). llvm-svn: 284844
*	fix variable names; NFCI	Sanjay Patel	2016-10-21	1	-2/+2
\| \| \| \| \| \| \| \|	Because we're just 'or-ing' these 2 variables later in the code, I don't think there's a logical bug here, but of course the string with "no size" is the one that should have the size suffix stripped off. llvm-svn: 284826
*	[DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds	Sanjay Patel	2016-10-21	1	-16/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in D24815, let's start the process of killing off the broken fast-math global state housed in TargetOptions and eliminate the need for function-level fast-math attributes. Here we enable two similar folds that are possible when we don't care about signed-zero: fadd nsz x, 0 --> x fsub nsz 0, x --> -x Note that although the test cases include a 'sin' function call, I'm side-stepping the FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node. Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't actually do anything today because Flags are silently dropped for any node that is not a binary operator. Differential Revision: https://reviews.llvm.org/D25297 llvm-svn: 284824
*	Using branch probability to guide critical edge splitting.	Dehao Chen	2016-10-20	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284757
*	Fix *_EXTEND_VECTOR_INREG legalization	Pirama Arumuga Nainar	2016-10-20	1	-3/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While promoting _EXTEND_VECTOR_INREG nodes whose inputs are already promoted, perform the appropriate sign extension for the promoted node before doing the _EXTEND_VECTOR_INREG operation. If not, the undefined high-order bits of the promoted operand may (a) be garbage inc ase of zext) or (b) contribute the wrong sign-bit (in case of sext) Updated the promote-vec3.ll test after this change. The diff shows explicit zeroing in case of zext and intermediate sign extension in case of sext. Reviewers: RKSimon Subscribers: llvm-commits, srhines Differential Revision: https://reviews.llvm.org/D25790 llvm-svn: 284752
*	[Target] remove TargetRecip class; 2nd try	Sanjay Patel	2016-10-20	2	-28/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746
*	Retire llvm::alignOf in favor of C++11 alignof.	Benjamin Kramer	2016-10-20	1	-1/+0
\| \| \| \| \| \|	No functionality change intended. llvm-svn: 284733
*	[DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> ↵	Simon Pilgrim	2016-10-20	1	-8/+8
\| \| \| \| \| \| \| \|	(and x, cst2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284717
*	DebugInfo: preparation to implement DW_AT_alignment	Victor Leschuk	2016-10-20	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	- Add alignment attribute to DIVariable family - Modify bitcode format to match new DIVariable representation - Update tests to match these changes (also add bitcode upgrade test) - Expect that frontend passes non-zero align value only when it is not default (was forcibly aligned by alignas()/_Alignas()/__atribute__(aligned()) Differential Revision: https://reviews.llvm.org/D25073 llvm-svn: 284678
*	[GlobalMerge] Handle non-landingpad EH pads	Reid Kleckner	2016-10-19	1	-14/+10
\| \| \| \| \| \| \| \| \|	This code crashed on funclet-style EH instructions such as catchpad, catchswitch, and cleanuppad. Just treat all EH pad instructions equivalently and avoid merging the globals they reference through any use. llvm-svn: 284633
*	Merged nested ifs. NFCI.	Simon Pilgrim	2016-10-19	1	-7/+6
\| \| \| \|	llvm-svn: 284616
*	[DAGCombiner] Add general constant vector support to (shl (add x, c1), c2) ↵	Simon Pilgrim	2016-10-19	1	-4/+5
\| \| \| \| \| \| \| \|	-> (add (shl x, c2), c1 << c2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284613
*	[WinEH] Allow catchpads to reuse the same catch object	Reid Kleckner	2016-10-19	1	-4/+7
\| \| \| \| \| \|	This code used a regular when it should have used a multimap. llvm-svn: 284612
*	[DAG] optimize negation of bool	Sanjay Patel	2016-10-19	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG. With the mask, this should be no worse than 2 shifts. The mask can be eliminated in some cases, so that should be better than 2 shifts. This change exposed some missing folds related to negation: https://reviews.llvm.org/rL284239 https://reviews.llvm.org/rL284395 There may be others, so please let me know if you see any regressions. Differential Revision: https://reviews.llvm.org/D25485 llvm-svn: 284611
*	[DAGCombiner] Add general constant vector support to (shl (sra x, c1), c1) ↵	Simon Pilgrim	2016-10-19	1	-7/+6
\| \| \| \| \| \| \| \|	-> (and x, (shl -1, c1)) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284608
*	[DAGCombiner] Add general constant vector support to (shl (mul x, c1), c2) ↵	Simon Pilgrim	2016-10-19	1	-5/+6
\| \| \| \| \| \| \| \|	-> (mul x, c1 << c2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284607
*	Revert r284604. A.K.A. "TMP"	Tim Northover	2016-10-19	1	-33/+0
\| \| \| \| \| \|	Committed by mistake. llvm-svn: 284606
*	TMP	Tim Northover	2016-10-19	1	-0/+33
\| \| \| \|	llvm-svn: 284604
*	GlobalISel: support translating volatile loads and stores.	Tim Northover	2016-10-19	1	-14/+17
\| \| \| \|	llvm-svn: 284603
*	[DAGCombiner] Just call isConstOrConstSplat directly. NFCI.	Simon Pilgrim	2016-10-19	1	-8/+4
\| \| \| \| \| \|	This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach. llvm-svn: 284578
*	[DAGCombine] Generalize distributeTruncateThroughAnd to work with any ↵	Simon Pilgrim	2016-10-19	1	-13/+9
\| \| \| \| \| \|	non-opaque constant or constant vector llvm-svn: 284574
*	Revert r284545 again as the regression in ppc still exists. There is bug in ↵	Dehao Chen	2016-10-19	1	-18/+0
\| \| \| \| \| \| \| \|	MBPI exposed by th patch. Also update the section.ll to fix non-x86 failure. llvm-svn: 284563
*	Using branch probability to guide critical edge splitting.	Dehao Chen	2016-10-18	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284545
*	revert r284541.	Dehao Chen	2016-10-18	1	-17/+0
\| \| \| \|	llvm-svn: 284544
*	Using branch probability to guide critical edge splitting.	Dehao Chen	2016-10-18	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284541
*	Use profile info to set function section prefix to group hot/cold functions.	Dehao Chen	2016-10-18	2	-4/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation: 1. add a new metadata for function section prefix 2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function 3. output the section prefix set by CGP Reviewers: davidxl, eraman Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24989 llvm-svn: 284533
*	GlobalISel: translate the @llvm.objectsize intrinsic.	Tim Northover	2016-10-18	1	-0/+7
\| \| \| \|	llvm-svn: 284527
*	GlobalISel: translate memcpy intrinsics.	Tim Northover	2016-10-18	1	-0/+22
\| \| \| \|	llvm-svn: 284525
*	[InterleavedAccessPass] Remove global variable.	Benjamin Kramer	2016-10-18	1	-6/+9
\| \| \| \| \| \| \|	This is a threading hazard and rightfully complained about by tsan. No functionality change. llvm-svn: 284515
*	revert r284495: [Target] remove TargetRecip class	Sanjay Patel	2016-10-18	2	-217/+26
\| \| \| \| \| \|	There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513
*	[Target] remove TargetRecip class; move reciprocal estimate isel ↵	Sanjay Patel	2016-10-18	2	-26/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495
*	[DAGCombiner] Add splatted vector support to (udiv x, (shl pow2, y)) -> x ↵	Simon Pilgrim	2016-10-18	1	-2/+3
\| \| \| \| \| \|	>>u (log2(pow2)+y) llvm-svn: 284491
*	DebugInfo: change alignment type from uint64_t to uint32_t to save space.	Victor Leschuk	2016-10-18	1	-2/+3
\| \| \| \| \| \| \| \| \|	In futher patches we shall have alignment field added to DIVariable family and switching from uint64_t to uint32_t will save 4 bytes per variable. Differential Revision: https://reviews.llvm.org/D25620 llvm-svn: 284482
*	Strip trailing whitespace (NFCI)	Simon Pilgrim	2016-10-18	1	-1/+1
\| \| \| \|	llvm-svn: 284478
*	[XRay] Support for for tail calls for ARM no-Thumb	Dean Michael Berris	2016-10-18	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456