bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add saving and restoring of r30 to the prologue and epilogue, respectively	Justin Hibbits	2015-01-08	2	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 llvm-svn: 225450
*	Fix large stack alignment codegen for ARM and Thumb2 targets	Kristof Beyls	2015-01-08	7	-15/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems stack alignment for Thumb1 targets hasn't been supported at all. Producing an aligned stack pointer is done by zero-ing out the lower bits of the stack pointer. The BIC instruction was used for this. However, the immediate field of the BIC instruction only allows to encode an immediate that can zero out up to a maximum of the 8 lower bits. When a larger alignment is requested, a BIC instruction cannot be used; llvm was silently producing incorrect code in this case. This commit fixes code generation for large stack aligments by using the BFC instruction instead, when the BFC instruction is available. When not, it uses 2 instructions: a right shift, followed by a left shift to zero out the lower bits. The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code that unconditionally uses BIC to realign the stack pointer, so it very likely has the same problem. However, I wasn't able to produce a test case for that. This commit adds an assert so that the compiler will fail the assert instead of silently generating wrong code if this is ever reached. llvm-svn: 225446
*	R600/SI: Remove SIISelLowering::legalizeOperands()	Tom Stellard	2015-01-08	10	-12/+33
\| \| \| \| \| \| \| \| \|	Its functionality has been replaced by calling SIInstrInfo::legalizeOperands() from SIISelLowering::AdjstInstrPostInstrSelection() and running the SIFoldOperands and SIShrinkInstructions passes. llvm-svn: 225445
*	Masked Load/Store - fixed a bug in type legalization.	Elena Demikhovsky	2015-01-08	1	-0/+49
\| \| \| \|	llvm-svn: 225441
*	Fix a think-o in the test for r225438.	Michael Kuperstein	2015-01-08	1	-1/+1
\| \| \| \|	llvm-svn: 225440
*	[X86] Don't try to generate direct calls to TLS globals	Michael Kuperstein	2015-01-08	1	-0/+19
\| \| \| \| \| \| \| \| \|	The call lowering assumes that if the callee is a global, we want to emit a direct call. This is correct for regular globals, but not for TLS ones. Differential Revision: http://reviews.llvm.org/D6862 llvm-svn: 225438
*	Fix test case I missed in r225432.	Craig Topper	2015-01-08	1	-1/+1
\| \| \| \|	llvm-svn: 225434
*	[X86] Don't print 'dword ptr' or 'qword ptr' on the operand to some of the ↵	Craig Topper	2015-01-08	3	-0/+47
\| \| \| \| \| \|	LEA variants in Intel syntax. The memory operand is inherently unsized. llvm-svn: 225432
*	Revert "Reapply: Teach SROA how to update debug info for fragmented variables."	Adrian Prantl	2015-01-08	4	-266/+4
\| \| \| \| \| \| \|	This reverts commit r225379 while investigating an assertion failure reported by Alexey. llvm-svn: 225424
*	[RegAllocGreedy] Introduce a late pass to repair broken hints.	Quentin Colombet	2015-01-08	1	-0/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A broken hint is a copy where both ends are assigned different colors. When a variable gets evicted in the neighborhood of such copies, it is likely we can reconcile some of them. Context Copies are inserted during the register allocation via splitting. These split points are required to relax the constraints on the allocation problem. When such a point is inserted, both ends of the copy would not share the same color with respect to the current allocation problem. When variables get evicted, the allocation problem becomes different and some split point may not be required anymore. However, the related variables may already have been colored. This usually shows up in the assembly with pattern like this: def A ... save A to B def A use A restore A from B ... use B Whereas we could simply have done: def B ... def A use A ... use B Proposed Solution A variable having a broken hint is marked for late recoloring if and only if selecting a register for it evict another variable. Indeed, if no eviction happens this is pointless to look for recoloring opportunities as it means the situation was the same as the initial allocation problem where we had to break the hint. Finally, when everything has been allocated, we look for recoloring opportunities for all the identified candidates. The recoloring is performed very late to rely on accurate copy cost (all involved variables are allocated). The recoloring is simple unlike the last change recoloring. It propagates the color of the broken hint to all its copy-related variables. If the color is available for them, the recoloring uses it, otherwise it gives up on that hint even if a more complex coloring would have worked. The recoloring happens only if it is profitable. The profitability is evaluated using the expected frequency of the copies of the currently recolored variable with a) its current color and b) with the target color. If a) is greater or equal than b), then it is profitable and the recoloring happen. Example Consider the following example: BB1: a = b = BB2: ... = b = a Let us assume b gets split: BB1: a = b = BB2: c = b ... d = c = d = a Because of how the allocation work, b, c, and d may be assigned different colors. Now, if a gets evicted to make room for c, assuming b and d were assigned to something different than a. We end up with: BB1: a = st a, SpillSlot b = BB2: c = b ... d = c = d e = ld SpillSlot = e This is likely that we can assign the same register for b, c, and d, getting rid of 2 copies. Performances Both ARM64 and x86_64 show performance improvements of up to 3% for the llvm-testsuite + externals with Os and O3. There are a few regressions too that comes from the (in)accuracy of the block frequency estimate. <rdar://problem/18312047> llvm-svn: 225422
*	RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases.	Matthias Braun	2015-01-07	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \|	I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a subregister index in SrcReg/DstReg, but they are actually subregister indices of the coalesced register that get you back to SrcReg/DstReg when applied. Fixed the bug, improved comments and simplified code accordingly. Testcase by Tom Stellard! llvm-svn: 225415
*	[GC] improve testing around gc.relocate and fix a test	Philip Reames	2015-01-07	2	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch by: Ramkumar Ramachandra <artagnon@gmail.com> "This patch started out as an exploration of gc.relocate, and an attempt to write a simple test in call-lowering. I then noticed that the arguments of gc.relocate were not checked fully, so I went in and fixed a few things. Finally, the most important outcome of this patch is that my new error handling code caught a bug in a callsite in stackmap-format." Differential Revision: http://reviews.llvm.org/D6824 llvm-svn: 225412
*	R600/SI: Commute instructions to enable more folding opportunities	Tom Stellard	2015-01-07	3	-4/+4
\| \| \| \|	llvm-svn: 225410
*	R600/SI: Only fold immediates that have one use	Tom Stellard	2015-01-07	1	-0/+35
\| \| \| \| \| \| \|	Folding the same immediate into multiple instruction will increase program size, which can hurt performance. llvm-svn: 225405
*	Linker: Don't use MDNode::replaceOperandWith()	Duncan P. N. Exon Smith	2015-01-07	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \|	`MDNode::replaceOperandWith()` changes all instances of metadata. Stop using it when linking module flags, since (due to uniquing) the flag values could be used by other metadata. Instead, use new API `NamedMDNode::setOperand()` to update the reference directly. llvm-svn: 225397
*	XFAIL several MCJIT EH tests under ASan and MSan bootstrap.	Alexey Samsonov	2015-01-07	4	-4/+4
\| \| \| \|	llvm-svn: 225393
*	Add a test that would have found the issue in r224935.	Rafael Espindola	2015-01-07	1	-0/+24
\| \| \| \|	llvm-svn: 225385
*	Slightly refactor things for llvm-objdump and the -macho option so it can be ↵	Kevin Enderby	2015-01-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	used with options other than just -disassemble so that universal files can be used with other options combined with -arch options. No functional change to existing options and use. One test case added for the additional functionality with a universal file an a -arch option. llvm-svn: 225383
*	More FMA folding opportunities.	Olivier Sallenave	2015-01-07	2	-0/+172
\| \| \| \|	llvm-svn: 225380
*	Reapply: Teach SROA how to update debug info for fragmented variables.	Adrian Prantl	2015-01-07	4	-4/+266
\| \| \| \| \| \| \| \|	The two buildbot failures were addressed in LLVM r225378 and CFE r225359. This rapplies commit 225272 without modifications. llvm-svn: 225379
*	Debug info: Allow aggregate types to be described by constants.	Adrian Prantl	2015-01-07	1	-0/+118
\| \| \| \|	llvm-svn: 225378
*	[Hexagon] Adding floating point classification and creation.	Colin LeMahieu	2015-01-07	1	-0/+12
\| \| \| \|	llvm-svn: 225374
*	R600/SI: Add a V_MOV_B64 pseudo instruction	Tom Stellard	2015-01-07	3	-30/+19
\| \| \| \| \| \| \|	This is used to simplify the SIFoldOperands pass and make it easier to fold immediates. llvm-svn: 225373
*	[Hexagon] Adding encodings for v5 floating point instructions.	Colin LeMahieu	2015-01-07	1	-0/+98
\| \| \| \|	llvm-svn: 225372
*	[Hexagon] Adding encoding for popcount, fastcorner, dword asr with rounding.	Colin LeMahieu	2015-01-07	3	-0/+8
\| \| \| \|	llvm-svn: 225371
*	R600/SI: Teach SIFoldOperands to split 64-bit constants when folding	Tom Stellard	2015-01-07	3	-8/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows folding of sequences like: s[0:1] = s_mov_b64 4 v_add_i32 v0, s0, v0 v_addc_u32 v1, s1, v1 into v_add_i32 v0, 4, v0 v_add_i32 v1, 0, v1 llvm-svn: 225369
*	Introduce an example statepoint GC strategy	Philip Reames	2015-01-07	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.) Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811 There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once. Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others. Differential Revision: http://reviews.llvm.org/D6808 llvm-svn: 225365
*	X86: Allow the stack probe size to be configurable per function	David Majnemer	2015-01-07	1	-0/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LLVM emits stack probes on Windows targets to ensure that the stack is correctly accessed. However, the amount of stack allocated before emitting such a probe is hardcoded to 4096. It is desirable to have this be configurable so that a function might opt-out of stack probes. Our level of granularity is at the function level instead of, say, the module level to permit proper generation of code after LTO. Patch by Andrew H! N.B. The inliner needs to be updated to properly consider what happens after inlining a function with a specific stack-probe-size into another function with a different stack-probe-size. llvm-svn: 225360
*	[X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.	Ahmed Bougacha	2015-01-07	2	-14/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For code like: float foo(float x) { return copysign(1.0, x); } We used to generate: andps <-0.000000e+00,0,0,0>, %xmm0 movss <1.000000e+00>, %xmm1 andps <nan>, %xmm1 orps %xmm0, %xmm1 Basically doing an abs(1.0f) in the two middle instructions. We now generate: andps <-0.000000e+00,0,0,0>, %xmm0 orps <1.000000e+00,0,0,0>, %xmm0 Builds on cleanups r223415, r223542. rdar://19049548 Differential Revision: http://reviews.llvm.org/D6555 llvm-svn: 225357
*	[ARM] Add missing Tag_DIV_use tests.	Charlie Turner	2015-01-07	1	-0/+8
\| \| \| \|	llvm-svn: 225348
*	[PM] Give slightly less horrible names to the utility pass templates for	Chandler Carruth	2015-01-07	1	-23/+23
\| \| \| \| \| \| \| \|	requiring and invalidating specific analyses. Also make their printed names match their class names. Writing these out as prose really doesn't make sense to me any more. llvm-svn: 225346
*	Revert r225165 and r225169	Karthik Bhat	2015-01-07	1	-170/+0
\| \| \| \| \| \| \| \|	Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case where the original subtraction could have caused an overflow. Reverting the same. llvm-svn: 225341
*	[PM] Fix a pretty nasty bug where the new pass manager would invalidate	Chandler Carruth	2015-01-07	1	-12/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	passes too many time. I think this is actually the issue that someone raised with me at the developer's meeting and in an email, but that we never really got to the bottom of. Having all the testing utilities made it much easier to dig down and uncover the core issue. When a pass manager is running many passes over a single function, we need it to invalidate the analyses between each run so that they can be re-computed as needed. We also need to track the intersection of preserved higher-level analyses across all the passes that we run (for example, if there is one module analysis which all the function analyses preserve, we want to track that and propagate it). Unfortunately, this interacted poorly with any enclosing pass adaptor between two IR units. It would see the intersection of preserved analyses, and need to invalidate any other analyses, but some of the un-preserved analyses might have already been invalidated and recomputed! We would fail to propagate the fact that the analysis had already been invalidated. The solution to this struck me as really strange at first, but the more I thought about it, the more natural it seemed. After a nice discussion with Duncan about it on IRC, it seemed even nicer. The idea is that invalidating an analysis causes it to be preserved! Preserving the lack of result is trivial. If it is recomputed, great. Until something else invalidates it again, we're good. The consequence of this is that the invalidate methods on the analysis manager which operate over many passes now consume their PreservedAnalyses object, update it to "preserve" every analysis pass to which it delivers an invalidation (regardless of whether the pass chooses to be removed, or handles the invalidation itself by updating itself). Then we return this augmented set from the invalidate routine, letting the pass manager take the result and use the intersection of that across each pass run to compute the final preserved set. This accounts for all the places where the early invalidation of an analysis has already "preserved" it for a future run. I've beefed up the testing and adjusted the assertions to show that we no longer repeatedly invalidate or compute the analyses across nested pass managers. llvm-svn: 225333
*	R600/SI: Add combine for isinfinite pattern	Matt Arsenault	2015-01-06	1	-0/+85
\| \| \| \|	llvm-svn: 225310
*	R600/SI: Pattern match isinf to v_cmp_class instructions	Matt Arsenault	2015-01-06	1	-0/+45
\| \| \| \|	llvm-svn: 225307
*	R600/SI: Add basic DAG combines for fp_class	Matt Arsenault	2015-01-06	1	-0/+162
\| \| \| \|	llvm-svn: 225306
*	R600/SI: Add class intrinsic	Matt Arsenault	2015-01-06	1	-0/+335
\| \| \| \|	llvm-svn: 225305
*	Fix using wrong intrinsic in test	Matt Arsenault	2015-01-06	1	-9/+9
\| \| \| \| \| \| \|	This is a leftover from renaming the intrinsic. It's surprising the unknown llvm. intrinsic wasn't rejected. llvm-svn: 225304
*	Change the .ll syntax for comdats and add a syntactic sugar.	Rafael Espindola	2015-01-06	36	-90/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to make comdats always explicit in the IR, we decided to make the syntax a bit more compact for the case of a GlobalObject in a comdat with the same name. Just dropping the $name causes problems for @foo = globabl i32 0, comdat $bar = comdat ... and declare void @foo() comdat $bar = comdat ... So the syntax is changed to @g1 = globabl i32 0, comdat($c1) @g2 = globabl i32 0, comdat and declare void @foo() comdat($c1) declare void @foo() comdat llvm-svn: 225302
*	[PowerPC] Reuse a load operand in int->fp conversions	Hal Finkel	2015-01-06	1	-0/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	int->fp conversions on PPC must be done through memory loads and stores. On a modern core, this process begins by storing the int value to memory, then loading it using a (sometimes special) FP load instruction. Unfortunately, we would do this even when the value to be converted was itself a load, and we can just use that same memory location instead of copying it to another first. There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs, because the fp_to_int operand has not been lowered when the int_to_fp is being lowered. We handle this specially by invoking fp_to_int's lowering logic (partially) and getting the necessary memory location (some trivial refactoring was done to make this possible). This is all somewhat ugly, and it would be nice if some later CodeGen stage could just clean this stuff up, but because doing so would involve modifying target-specific nodes (or instructions), it is not immediately clear how that would work. Also, remove a related entry from the README.txt for which we now generate reasonable code. llvm-svn: 225301
*	[Hexagon] Adding compound jump encodings.	Colin LeMahieu	2015-01-06	1	-0/+148
\| \| \| \|	llvm-svn: 225291
*	R600/SI: Insert s_waitcnt before s_barrier instructions.	Tom Stellard	2015-01-06	2	-0/+5
\| \| \| \| \| \| \|	This ensures that all memory operations are complete when all threads reach the barrier. llvm-svn: 225290
*	Revert "Reapply: Teach SROA how to update debug info for fragmented variables."	Adrian Prantl	2015-01-06	4	-266/+4
\| \| \| \| \| \| \| \| \|	because of a tsan buildbot failure. This reverts commit 225272. Fix should be coming soon. llvm-svn: 225288
*	[Hexagon] Adding encoding for misc v4 instructions: boundscheck, tlbmatch, ↵	Colin LeMahieu	2015-01-06	2	-0/+8
\| \| \| \| \| \|	dcfetch. llvm-svn: 225283
*	This patch teaches IndVarSimplify to add nuw and nsw to certain kinds	Sanjoy Das	2015-01-06	3	-2/+216
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of operations that provably don't overflow. For example, we can prove %civ.inc below does not sign-overflow. With this change, IndVarSimplify changes %civ.inc to an add nsw. define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { entry: %length = load i32* %length_ptr, !range !0 %len.sub.1 = sub i32 %length, 1 %upper = icmp slt i32 %init, %len.sub.1 br i1 %upper, label %loop, label %exit loop: %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] %civ.inc = add i32 %civ, 1 %cmp = icmp slt i32 %civ.inc, %length br i1 %cmp, label %latch, label %break latch: store i32 0, i32* %array %check = icmp slt i32 %civ.inc, %len.sub.1 br i1 %check, label %loop, label %break break: ret i32 %civ.inc exit: ret i32 42 } Differential Revision: http://reviews.llvm.org/D6748 llvm-svn: 225282
*	[Hexagon] Adding encoding information for absolute address loads.	Colin LeMahieu	2015-01-06	1	-0/+12
\| \| \| \|	llvm-svn: 225279
*	R600/SI: Add a stub GCNTargetMachine	Tom Stellard	2015-01-06	278	-309/+309
\| \| \| \| \| \| \| \| \| \| \| \|	This is equivalent to the AMDGPUTargetMachine now, but it is the starting point for separating R600 and GCN functionality into separate targets. It is recommened that users start using the gcn triple for GCN-based GPUs, because using the r600 triple for these GPUs will be deprecated in the future. llvm-svn: 225277
*	[CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz.	Andrea Di Biagio	2015-01-06	1	-0/+172
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the logic added at revision 224899 (see review D6728) that teaches the backend when it is profitable to speculate calls to cttz/ctlz. The original algorithm conservatively avoided speculating more than one instruction from a basic block in a control flow grap modelling an if-statement. In particular, the only allowed instruction (excluding the terminator) was a call to cttz/ctlz. However, there are cases where we could be less conservative and still be able to speculate a call to cttz/ctlz. With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the result is zero extended/truncated in the same basic block, and the zext/trunc instruction is "free" for the target. Added new test cases to CodeGen/X86/cttz-ctlz.ll Differential Revision: http://reviews.llvm.org/D6853 llvm-svn: 225274
*	Reapply: Teach SROA how to update debug info for fragmented variables.	Adrian Prantl	2015-01-06	4	-4/+266
\| \| \| \| \| \| \| \| \| \|	This also rolls in the changes discussed in http://reviews.llvm.org/D6766. Defers migrating the debug info for new allocas until after all partitions are created. Thanks to Chandler for reviewing! llvm-svn: 225272
*	Don't loop endlessly for MachO files with 0 ncmds	Filipe Cabecinhas	2015-01-06	2	-0/+9
\| \| \| \|	llvm-svn: 225271