bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	constify the getters in SDNodeDbgValue.	Adrian Prantl	2014-10-13	1	-12/+12
\| \| \| \|	llvm-svn: 219627
*	Refactor debug statement and remove dead argument. NFC.	Chad Rosier	2014-10-13	2	-18/+13
\| \| \| \|	llvm-svn: 219626
*	Modernize old-style static asserts. NFC.	Benjamin Kramer	2014-10-12	1	-1/+1
\| \| \| \|	llvm-svn: 219588
*	Improve sqrt estimate algorithm (fast-math)	Sanjay Patel	2014-10-09	1	-17/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes the fast-math implementation for calculating sqrt(x) from: y = 1 / (1 / sqrt(x)) to: y = x * (1 / sqrt(x)) This has 2 benefits: less code / faster code and one less estimate instruction that may lose precision. The only target that will be affected (until http://reviews.llvm.org/D5658 is approved) is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf or vector sqrtf and 4 less flops for a double-precision sqrt. We also eliminate a constant load and extra register usage. Differential Revision: http://reviews.llvm.org/D5682 llvm-svn: 219445
*	Remove more calls to getSubtargetImpl from the schedulers and	Eric Christopher	2014-10-09	3	-24/+17
\| \| \| \| \| \|	remove cached or unnecessary TargetMachines. llvm-svn: 219387
*	Remove unused argument to CreateTargetScheduleState and change	Eric Christopher	2014-10-09	1	-1/+1
\| \| \| \| \| \| \|	the TargetMachine to a TargetSubtargetInfo since everything we wanted is off of that. llvm-svn: 219382
*	Remove uses of getSubtargetImpl from ResourcePriorityQueue and	Eric Christopher	2014-10-09	1	-7/+5
\| \| \| \| \| \|	replace them with calls off of the MachineFuncton. llvm-svn: 219381
*	Remove the uses of getSubtargetImpl from InstrEmitter and remove	Eric Christopher	2014-10-09	2	-9/+6
\| \| \| \| \| \|	the now unused TargetMachine variable. llvm-svn: 219379
*	Use the subtarget on the dag to get TargetFrameLowering rather	Eric Christopher	2014-10-09	2	-2/+2
\| \| \| \| \| \|	than off the target machine. llvm-svn: 219378
*	Remove uses of the TargetMachine from FunctionLoweringInfo	Eric Christopher	2014-10-09	2	-15/+11
\| \| \| \| \| \|	via caching TargetLowering and using the MachineFunction. llvm-svn: 219375
*	Remove unnecessary include.	Eric Christopher	2014-10-08	1	-1/+0
\| \| \| \|	llvm-svn: 219368
*	Use both the cached TLI and the subtarget off of the DAG in	Eric Christopher	2014-10-08	1	-15/+10
\| \| \| \| \| \|	the DAG combiner. llvm-svn: 219367
*	Remove getSubtargetImpl calls from FastISel, we can get it from	Eric Christopher	2014-10-08	1	-6/+5
\| \| \| \| \| \|	the MachineFunction where it's already cached. llvm-svn: 219366
*	Remove dead call to getTypeToTransformTo. The result is	Eric Christopher	2014-10-08	1	-3/+1
\| \| \| \| \| \|	unused. llvm-svn: 219347
*	Remove a bunch of getSubtargetImpl calls since we already have	Eric Christopher	2014-10-08	1	-35/+6
\| \| \| \| \| \|	a cached TLI instance. llvm-svn: 219342
*	Use the TargetLowering information we already have on the	Eric Christopher	2014-10-08	1	-305/+256
\| \| \| \| \| \| \|	SelectionDAG in SelectionDAGBuilder rather than going through the TargetMachine for lookup. llvm-svn: 219292
*	Grab the TargetRegisterInfo off of the subtarget from the	Eric Christopher	2014-10-08	1	-1/+1
\| \| \| \| \| \| \|	MachineFunction rather than a lookup on the TargetMachine to avoid unnecessary lookups. llvm-svn: 219291
*	Cache TargetLowering on SelectionDAGISel and update previous	Eric Christopher	2014-10-08	4	-28/+26
\| \| \| \| \| \|	calls to getTargetLowering() with the cached variable. llvm-svn: 219284
*	Cache SelectionDAGISel TargetInstrInfo lookups on the class and	Eric Christopher	2014-10-08	1	-13/+9
\| \| \| \| \| \| \|	propagate. Also use the TargetSubtargetInfo and the MachineFunction and move TargetRegisterInfo query closer to uses. llvm-svn: 219273
*	Reset the target options and optimization level as the first	Eric Christopher	2014-10-08	1	-8/+13
\| \| \| \| \| \| \| \| \|	thing we do inside selection dag. This code needs to be migrated to queries on the function rather than global data, but this organizes things before we start grabbing the subtarget. llvm-svn: 219271
*	Have the selection dag grab TargetLowering off of the subtarget	Eric Christopher	2014-10-08	2	-4/+3
\| \| \| \| \| \|	inside init rather than have it passed in as an argument. llvm-svn: 219270
*	Have SelectionDAG's subtarget TargetSelectionDAGInfo be set	Eric Christopher	2014-10-08	1	-2/+2
\| \| \| \| \| \|	during init rather than construction time. llvm-svn: 219262
*	typos	Sanjay Patel	2014-10-07	1	-2/+2
\| \| \| \|	llvm-svn: 219221
*	typos	Sanjay Patel	2014-10-07	1	-1/+1
\| \| \| \|	llvm-svn: 219220
*	[DAGCombine] Remove SIGN_EXTEND-related inf-loop	Hal Finkel	2014-10-06	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch's author points out that, despite the function's documentation, getSetCCResultType is only used to get the SETCC result type (with one here-removed problematic exception). In one case, getSetCCResultType was being used to get the predicate type to use for a SELECT node, and then SIGN_EXTENDing (or truncating) to get the input predicate to match that type. Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the extension/truncation seems unnecessary here: SELECT is defined as: Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1 then the high bits must conform to getBooleanContents. So here we remove this use of getSetCCResultType and update getSetCCResultType's documentation to reflect its actual uses. Patch by deadal nix! llvm-svn: 219141
*	Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)	Sanjay Patel	2014-10-06	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c: float distance = sqrt(dx * dx + dy * dy + dz * dz); float mag = dt / (distance * distance * distance); Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces: addis 3, 2, .LCPI4_2@toc@ha lfs 4, .LCPI4_2@toc@l(3) addis 3, 2, .LCPI4_1@toc@ha lfs 0, .LCPI4_1@toc@l(3) fcmpu 0, 1, 4 beq 0, .LBB4_2 # BB#1: frsqrtes 4, 1 addis 3, 2, .LCPI4_0@toc@ha lfs 5, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 5, 1 fmuls 6, 4, 4 fmadds 1, 13, 6, 5 fmuls 1, 4, 1 fres 4, 1 <--- reciprocal of reciprocal square root fnmsubs 1, 1, 4, 0 fmadds 4, 4, 1, 4 .LBB4_2: fmuls 1, 4, 2 fres 2, 1 fnmsubs 0, 1, 2, 0 fmadds 0, 2, 0, 2 fmuls 1, 3, 0 blr After the patch, this simplifies to: frsqrtes 0, 1 addis 3, 2, .LCPI4_1@toc@ha fres 5, 2 lfs 4, .LCPI4_1@toc@l(3) addis 3, 2, .LCPI4_0@toc@ha lfs 7, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 4, 1 fmuls 6, 0, 0 fnmsubs 2, 2, 5, 7 fmadds 1, 13, 6, 4 fmadds 2, 5, 2, 5 fmuls 0, 0, 1 fmuls 0, 0, 2 fmuls 1, 3, 0 blr Differential Revision: http://reviews.llvm.org/D5628 llvm-svn: 219139
*	[x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle	Chandler Carruth	2014-10-05	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that are unused. This allows the combiner to delete math feeding shuffles where the math isn't actually necessary. This improves some of the vperm2x128 tests that regressed when the vector shuffle lowering started actually generating vperm instructions rather than forcibly decomposing them. Sadly, this isn't enough to get this really right because we still form a completely unnecessary permutation. To fix that, we also need to fold shuffles which just rearrange concatenated or inserted subvectors. llvm-svn: 219086
*	Remove unnecessary copying or replace it with moves in a bunch of places.	Benjamin Kramer	2014-10-04	3	-5/+6
\| \| \| \| \| \|	NFC. llvm-svn: 219061
*	[ISel] Keep matching state consistent when folding during X86 address match	Adam Nemet	2014-10-03	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
*	Eliminate some deep std::vector copies. NFC.	Benjamin Kramer	2014-10-03	1	-6/+3
\| \| \| \|	llvm-svn: 218999
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	8	-81/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
*	Revert r218778 while investigating buldbot breakage.	Adrian Prantl	2014-10-01	8	-98/+81
\| \| \| \| \| \|	"Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	8	-81/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
*	Use the target-specified iteration count to opt out of any further ↵	Sanjay Patel	2014-09-30	1	-60/+62
\| \| \| \| \| \|	refinement of an estimate. NFC. llvm-svn: 218700
*	Split the estimate() interface into separate functions for each type. NFC.	Sanjay Patel	2014-09-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
*	[DAG] Check in advance if a build_vector has a legal type before attempting ↵	Andrea Di Biagio	2014-09-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677
*	[AArch64] Redundant store instructions should be removed as dead code	James Molloy	2014-09-27	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
*	Refactor reciprocal and reciprocal square root estimate into ↵	Sanjay Patel	2014-09-26	1	-28/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
*	Revert patch ofr218493	David Xu	2014-09-26	1	-14/+0
\| \| \| \|	llvm-svn: 218494
*	Redundant store instructions should be removed as dead code	David Xu	2014-09-26	1	-0/+14
\| \| \| \|	llvm-svn: 218493
*	Move resetTargetOptions from taking a MachineFunction to a Function	Eric Christopher	2014-09-26	1	-1/+1
\| \| \| \| \| \| \|	since we are accessing the TargetMachine that we're a member function of. llvm-svn: 218489
*	SelectionDAG: Remove #if NDEBUG from check for a post-isel hook	Tom Stellard	2014-09-25	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The InstrEmitter will skip the check of MI.hasPostISelHook() before calling AdjustInstrPostInstrSelection() when NDEBUG is not defined. This was added in r140228, and I'm not sure if it is intentional or not, but it is a likely source for bugs, because it means with Release+Asserts builds you can forget to set the hasPostISelHook flag on TableGen definitions and AdjustInstrPostInstrSelection() will still be called. llvm-svn: 218458
*	Clear PreferredExtendType for in each function-specific state ↵	Jiangning Liu	2014-09-24	1	-0/+1
\| \| \| \| \| \|	FunctionLoweringInfo. llvm-svn: 218364
*	Use SDValue bool operator to reduce code. No functional change.	Sanjay Patel	2014-09-23	1	-9/+6
\| \| \| \|	llvm-svn: 218314
*	Refactor reciprocal square root estimate into target-independent function; NFC.	Sanjay Patel	2014-09-21	1	-17/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219
*	Fix crash with an insertvalue that produces an empty object.	Peter Collingbourne	2014-09-20	1	-0/+6
\| \| \| \|	llvm-svn: 218171
*	Optionally enable more-aggressive FMA formation in DAGCombine	Hal Finkel	2014-09-19	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120
*	Optimize sext/zext insertion algorithm in back-end.	Jiangning Liu	2014-09-19	3	-8/+61
\| \| \| \| \| \| \| \|	With this optimization, we will not always insert zext for values crossing basic blocks, but insert sext if the users of a value crossing basic block has preference of sign predicate. llvm-svn: 218101
*	Replace repeated null checks with an assert. NFC.	Sanjay Patel	2014-09-15	1	-18/+14
\| \| \| \| \| \| \|	Without a vector to hold the created ops, these functions don't have any use. llvm-svn: 217831
*	[FastISel] Move optimizeCmpPredicate to FastISel base class. NFC.	Juergen Ributzka	2014-09-15	1	-0/+40
\| \| \| \| \| \|	Make the optimizeCmpPredicate function available to all targets. llvm-svn: 217822