bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	18	-195/+269
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
*	Revert r216862 due to a performance regression	Jingyue Wu	2014-10-01	1	-9/+21
\| \| \| \| \| \|	Reported by Alexey Volkov in PR21115 llvm-svn: 218771
*	Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_bound	David Blaikie	2014-10-01	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows proper disambiguation of unbounded arrays and arrays of zero bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC instead produces an upper bound of -1 in the latter situation, but count seems tidier. This way lower_bound is provided if it's not the language default and count is provided if the count is known, otherwise it's omitted. Simple. If someone wants to look at rdar://problem/12566646 and see if this change is acceptable to that bug/fix, that might be helpful (see the empty-and-one-elem-array.ll test case which cites that radar). llvm-svn: 218726
*	Omit DW_AT_inline under -gmlt to save a little more space.	David Blaikie	2014-09-30	1	-1/+2
\| \| \| \|	llvm-svn: 218719
*	DebugInfo: Sink the code emitting DW_AT_APPLE_omit_frame_ptr down to a more ↵	David Blaikie	2014-09-30	2	-7/+5
\| \| \| \| \| \| \| \| \| \| \|	common spot. No functional change. Pre-emptive refactoring before I start pushing some of this subprogram creation down into DWARFCompileUnit so I can build different subprograms in the skeleton unit from the dwo unit for adding -gmlt-like data to the skeleton. llvm-svn: 218713
*	Disable the -gmlt optimization implemented in r218129 under Darwin due to ↵	David Blaikie	2014-09-30	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	issues with dsymutil. r218129 omits DW_TAG_subprograms which have no inlined subroutines when emitting -gmlt data. This makes -gmlt very low cost for -O0 builds. Darwin's dsymutil reasonably considers a CU empty if it has no subprograms (which occurs with the above optimization in -O0 programs without any force_inline function calls) and drops the line table, CU, and everything in this situation, making backtraces impossible. Until dsymutil is modified to account for this, disable this optimization on Darwin to preserve the desired functionality. (see r218545, which should be reverted after this patch, for other discussion/details) Footnote: In the long term, it doesn't look like this scheme (of simplified debug info to describe inlining to enable backtracing) is tenable, it is far too size inefficient for optimized code (the DW_TAG_inlined_subprograms, even once compressed, are nearly twice as large as the line table itself (also compressed)) and we'll be considering things like Cary's two level line table proposal to encode all this information directly in the line table. llvm-svn: 218702
*	Use the target-specified iteration count to opt out of any further ↵	Sanjay Patel	2014-09-30	1	-60/+62
\| \| \| \| \| \|	refinement of an estimate. NFC. llvm-svn: 218700
*	Split the estimate() interface into separate functions for each type. NFC.	Sanjay Patel	2014-09-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
*	[DAG] Check in advance if a build_vector has a legal type before attempting ↵	Andrea Di Biagio	2014-09-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677
*	Add MachineOperand::ChangeToFPImmediate and setFPImm	Matt Arsenault	2014-09-28	1	-7/+25
\| \| \| \|	llvm-svn: 218579
*	[AArch64] Redundant store instructions should be removed as dead code	James Molloy	2014-09-27	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
*	Refactor reciprocal and reciprocal square root estimate into ↵	Sanjay Patel	2014-09-26	1	-28/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
*	Revert patch ofr218493	David Xu	2014-09-26	1	-14/+0
\| \| \| \|	llvm-svn: 218494
*	Redundant store instructions should be removed as dead code	David Xu	2014-09-26	1	-0/+14
\| \| \| \|	llvm-svn: 218493
*	Move resetTargetOptions from taking a MachineFunction to a Function	Eric Christopher	2014-09-26	1	-1/+1
\| \| \| \| \| \| \|	since we are accessing the TargetMachine that we're a member function of. llvm-svn: 218489
*	[MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo	Bruno Cardoso Lopes	2014-09-25	1	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Machine Sink uses loop depth information to select between successors BBs to sink machine instructions into, where BBs within smaller loop depths are preferable. This patch adds support for choosing between successors by using profile information from BlockFrequencyInfo instead, whenever the information is available. Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5% execution speedup in average on x86-64 darwin. <rdar://problem/18021659> llvm-svn: 218472
*	SelectionDAG: Remove #if NDEBUG from check for a post-isel hook	Tom Stellard	2014-09-25	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The InstrEmitter will skip the check of MI.hasPostISelHook() before calling AdjustInstrPostInstrSelection() when NDEBUG is not defined. This was added in r140228, and I'm not sure if it is intentional or not, but it is a likely source for bugs, because it means with Release+Asserts builds you can forget to set the hasPostISelHook flag on TableGen definitions and AdjustInstrPostInstrSelection() will still be called. llvm-svn: 218458
*	Lower idempotent RMWs to fence+load	Robin Morisset	2014-09-25	1	-2/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 llvm-svn: 218455
*	Clear PreferredExtendType for in each function-specific state ↵	Jiangning Liu	2014-09-24	1	-0/+1
\| \| \| \| \| \|	FunctionLoweringInfo. llvm-svn: 218364
*	[X86] Make wide loads be managed by AtomicExpand	Robin Morisset	2014-09-23	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 llvm-svn: 218332
*	Add AtomicExpandPass::bracketInstWithFences, and use it whenever ↵	Robin Morisset	2014-09-23	1	-38/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getInsertFencesForAtomic would trigger in SelectionDAGBuilder Summary: The goal is to eventually remove all the code related to getInsertFencesForAtomic in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works mostly by accident because the backends are overly conservative), and repeats the same logic that goes in emitLeading/TrailingFence. In this patch, I make AtomicExpandPass insert the fences as it knows better where to put them. Because this requires getting the fences and not just passing an IRBuilder around, I had to change the return type of emitLeading/TrailingFence. This code only triggers on ARM for now. Because it is earlier in the pipeline than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so SelectionDAGBuilder does not add barriers anymore on ARM. If this patch is accepted I plan to implement emitLeading/TrailingFence for all backends that setInsertFencesForAtomic(true), which will allow both making them less conservative and simplifying SelectionDAGBuilder once they are all using this interface. This should not cause any functionnal change so the existing tests are used and not modified. Test Plan: make check-all, benefits from existing tests of atomics on ARM Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5179 llvm-svn: 218329
*	[MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT is	Lang Hames	2014-09-23	2	-15/+0
\| \| \| \| \| \|	gone they're no longer needed. llvm-svn: 218320
*	Use SDValue bool operator to reduce code. No functional change.	Sanjay Patel	2014-09-23	1	-9/+6
\| \| \| \|	llvm-svn: 218314
*	MC: ReadOnlyWithRel section kinds should map to rdata in COFF	David Majnemer	2014-09-22	1	-4/+4
\| \| \| \| \| \| \|	Don't consider ReadOnlyWithRel as a writable section in COFF, they really belong in .rdata. llvm-svn: 218268
*	Refactor reciprocal square root estimate into target-independent function; NFC.	Sanjay Patel	2014-09-21	1	-17/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219
*	mop up: "Don’t duplicate function or class name at the beginning of the ↵	Sanjay Patel	2014-09-21	3	-59/+45
\| \| \| \| \| \|	comment." llvm-svn: 218218
*	mop up: "Don’t duplicate function or class name at the beginning of the ↵	Sanjay Patel	2014-09-20	1	-25/+21
\| \| \| \| \| \|	comment." llvm-svn: 218194
*	MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF	David Majnemer	2014-09-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	A problem with our old behavior becomes observable under x86-64 COFF when we need a read-only GV which has an initializer which is referenced using a relocation: we would mark the section as writable. Marking the section as writable interferes with section merging. This fixes PR21009. llvm-svn: 218179
*	Fix crash with an insertvalue that produces an empty object.	Peter Collingbourne	2014-09-20	1	-0/+6
\| \| \| \|	llvm-svn: 218171
*	Converting SpillPlacement's BlockFrequency threshold to a ManagedStatic to ↵	Chris Bieneman	2014-09-19	1	-3/+4
\| \| \| \| \| \|	avoid static constructors and destructors. llvm-svn: 218163
*	Omit DW_TAG_subprograms for subprograms without inlined subroutines when ↵	David Blaikie	2014-09-19	2	-24/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	producing -gmlt data To reduce the size of -gmlt data, skip the subprograms without any inlined subroutines. Since we've now got the ability to make these determinations in the backend (funnily enough - we added the flag so we wouldn't produce ranges under -gmlt, but with this change we use the flag, but go back to producing ranges under -gmlt). Instead, just produce CU ranges to inform the consumer which parts of the code are described by this CU's line table. Tools could inspect the line table directly to compute the range, but the CU ranges only seem to be about 0.5% of object/executable size, so I'm not too worried about teaching llvm-symbolizer that trick just yet - it's certainly a possible piece of future work. Update an llvm-symbolizer test just to demonstrate that this schema is acceptable there (if it wasn't, the compiler-rt tests would catch this, but good to have an in-llvm-tree test for llvm-symbolizer's behavior here) Building the clang binary with -gmlt with this patch reduces the total size of object files by 5.1% (5.56% without ranges) without compression and the executable by 4.37% (4.75% without ranges). llvm-svn: 218129
*	Change DwarfCompileUnit::createGlobalVariable to getOrCreateGlobalVariable.	Frederic Riss	2014-09-19	3	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This will allow to request the creation of a forward delacred variable at is point of use (for imported declarations, this will be DwarfDebug::constructImportedEntityDIE) rather than having to put the forward decl in a retention list. Note that getOrCreateGlobalVariable returns the actual definition DIE when the routine creates a declaration and a definition DIE. If you agree this is the right behavior, then I'll have a followup patch that registers the definition in the DIE map instead of the declaration as it is today (this 'breaks' only one test, where we test that the imported entity is the declaration). I'm not sure what's best here, but it's easy enough for a consumer to follow the DW_AT_specification link to get to the declaration, whereas it takes more work to find the actual definition from a declaration DIE. Reviewers: echristo, dblaikie, aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5381 llvm-svn: 218126
*	Optionally enable more-aggressive FMA formation in DAGCombine	Hal Finkel	2014-09-19	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120
*	Optimize sext/zext insertion algorithm in back-end.	Jiangning Liu	2014-09-19	3	-8/+61
\| \| \| \| \| \| \| \|	With this optimization, we will not always insert zext for values crossing basic blocks, but insert sext if the users of a value crossing basic block has preference of sign predicate. llvm-svn: 218101
*	Omit DW_AT_frame_base under -gmlt for size	David Blaikie	2014-09-19	1	-3/+7
\| \| \| \|	llvm-svn: 218100
*	Describe the -gmlt optimization committed in the previous revision.	David Blaikie	2014-09-19	1	-0/+1
\| \| \| \|	llvm-svn: 218099
*	Omit all the extra static attributes on subprograms in -gmlt	David Blaikie	2014-09-19	1	-0/+3
\| \| \| \| \| \| \| \|	This omission will be done in a fancier manner once we're dealing with "put gmlt in the skeleton CUs under fission" - it'll have to be conditional on the kind of CU we're emitting into (skeleton or gmlt). llvm-svn: 218098
*	Fix an it's vs. its typo.	Hans Wennborg	2014-09-19	1	-1/+1
\| \| \| \|	llvm-svn: 218093
*	Revert part of r218041.	Frederic Riss	2014-09-18	1	-0/+3
\| \| \| \| \| \| \| \|	The patch moved some logic around in an attempt to generate potentially more DW_AT_declaration attributes. The patch was flawed though and it stopped generating the attribute in some cases. llvm-svn: 218060
*	Always emit DW_AT_declaration attribute when the variable isn't a definition.	Frederic Riss	2014-09-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This doesn't show up today as we don't emit decalration only variables. This will be tested when the followup patches implementing import of forward declared entities lands in clang. Reviewers: echristo, dblaikie, aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5382 llvm-svn: 218041
*	Add a new pass FunctionTargetTransformInfo. This pass serves as a	Eric Christopher	2014-09-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004
*	[X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass	Robin Morisset	2014-09-17	1	-52/+133
\| \| \| \| \| \| \| \| \| \| \| \|	This required a new hook called hasLoadLinkedStoreConditional to know whether to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to CmpXchg (X86). Apart from that, the new code in AtomicExpandPass is mostly moved from X86AtomicExpandPass. The main result of this patch is to get rid of that pass, which had lots of code duplicated with AtomicExpandPass. llvm-svn: 217928
*	[CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting	Quentin Colombet	2014-09-16	1	-45/+55
\| \| \| \| \| \|	instructions when truncate, sext, or zext were created. Fix that. llvm-svn: 217926
*	Add back a fallback case for targets that do not or cannot implement ↵	Owen Anderson	2014-09-16	1	-1/+5
\| \| \| \| \| \|	getNoopForMachoTarget(). llvm-svn: 217899
*	Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector types	Hal Finkel	2014-09-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default implementation of getCmpSelInstrCost, which provides the cost of icmp/fcmp/select instructions, did not deal sensibly with illegal vector types that were scalarized. We'd ask for the legalization cost of the vector type, which would return something like (4, f64) given an input of <4 x double>, and we'd then check the TLI status of the ISD opcode on that scalar type. This would result in querying (ISD::VSELECT, f64), for example. Amusingly enough, ISD::VSELECT on scalar types is marked as Legal by default (as with most other operations), and most backends never change this because VSELECT is never generated on scalars. However, seeing the resulting operation as Legal, we'd neglect to add the scalarization cost before returning. The result is that we'd grossly under-estimate the cost of cmps/selects on illegal vector types. Now, if type legalization clearly results in scalarization, we skip the early return and add the scalarization cost. llvm-svn: 217859
*	DebugInfo: Add comment describing the need to disable address pool usage in ↵	David Blaikie	2014-09-15	1	-0/+5
\| \| \| \| \| \| \| \|	skeleton units. Post commit review from Eric Christopher. llvm-svn: 217842
*	Replace repeated null checks with an assert. NFC.	Sanjay Patel	2014-09-15	1	-18/+14
\| \| \| \| \| \| \|	Without a vector to hold the created ops, these functions don't have any use. llvm-svn: 217831
*	[FastISel] Move optimizeCmpPredicate to FastISel base class. NFC.	Juergen Ributzka	2014-09-15	1	-0/+40
\| \| \| \| \| \|	Make the optimizeCmpPredicate function available to all targets. llvm-svn: 217822
*	Replace dead links to "Hacker's Delight" with general references. NFC.	Sanjay Patel	2014-09-15	3	-10/+10
\| \| \| \|	llvm-svn: 217814
*	Fix a lot of confusion around inserting nops on empty functions.	Rafael Espindola	2014-09-15	2	-15/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On MachO, and MachO only, we cannot have a truly empty function since that breaks the linker logic for atomizing the section. When we are emitting a frame pointer, the presence of an unreachable will create a cfi instruction pointing past the last instruction. This is perfectly fine. The FDE information encodes the pc range it applies to. If some tool cannot handle this, we should explicitly say which bug we are working around and only work around it when it is actually relevant (not for ELF for example). Given the unreachable we could omit the .cfi_def_cfa_register, but then again, we could also omit the entire function prologue if we wanted to. llvm-svn: 217801