bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	R600/SI: Fix live range error hidden by SIFoldOperands	Matt Arsenault	2014-12-03	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	m0 is treated as a virtual register class with a single register rather than the physical register it really is. This was updating the live range of the used virtual copy of m0 from the first ds_read instruction, and leaving the unused copy unchanged. This resulted in a "Live segment doesn't end at a valid instruction" verifier error because the erased instructions. Update the live range of the second copy (which should be dead). No test since I'm not sure how to trigger this with SIFoldOperands enabled. llvm-svn: 223203
*	StructurizeCFG: Use LoopInfo analysis for better loop detection	Tom Stellard	2014-12-03	1	-1/+6
\| \| \| \| \| \| \| \|	We were assuming that each back-edge in a region represented a unique loop, which is not always the case. We need to use LoopInfo to correctly determine which back-edges are loops. llvm-svn: 223199
*	NVPTX: Delete dead code	Duncan P. N. Exon Smith	2014-12-03	1	-5/+0
\| \| \| \| \| \|	`MDNode` does not inherit from `User`, and it never has a name. llvm-svn: 223198
*	R600/SI: Enable inline assembly	Tom Stellard	2014-12-03	1	-2/+1
\| \| \| \| \| \| \| \|	We just needed to remove the assertion in AMDGPURegisterInfo::getFrameRegister(), which is called when initializing the parser for inline assembly. llvm-svn: 223197
*	R600/SI: Change mubuf offsets to print as decimal	Matt Arsenault	2014-12-03	1	-1/+1
\| \| \| \| \| \|	This matches SC's behavior. llvm-svn: 223194
*	Emit the entry block first and the exit block second, then all the blocks in ↵	Nick Lewycky	2014-12-03	1	-3/+7
\| \| \| \| \| \|	between afterwards. This is what gcc always does, and some out of tree tools depend on that. llvm-svn: 223193
*	Prologue support	Peter Collingbourne	2014-12-03	14	-19/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch by Ben Gamari! This redefines the `prefix` attribute introduced previously and introduces a `prologue` attribute. There are a two primary usecases that these attributes aim to serve, 1. Function prologue sigils 2. Function hot-patching: Enable the user to insert `nop` operations at the beginning of the function which can later be safely replaced with a call to some instrumentation facility 3. Runtime metadata: Allow a compiler to insert data for use by the runtime during execution. GHC is one example of a compiler that needs this functionality for its tables-next-to-code functionality. Previously `prefix` served cases (1) and (2) quite well by allowing the user to introduce arbitrary data at the entrypoint but before the function body. Case (3), however, was poorly handled by this approach as it required that prefix data was valid executable code. Here we redefine the notion of prefix data to instead be data which occurs immediately before the function entrypoint (i.e. the symbol address). Since prefix data now occurs before the function entrypoint, there is no need for the data to be valid code. The previous notion of prefix data now goes under the name "prologue data" to emphasize its duality with the function epilogue. The intention here is to handle cases (1) and (2) with prologue data and case (3) with prefix data. References ---------- This idea arose out of discussions[1] with Reid Kleckner in response to a proposal to introduce the notion of symbol offsets to enable handling of case (3). [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html Test Plan: testsuite Differential Revision: http://reviews.llvm.org/D6454 llvm-svn: 223189
*	[X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80.	Ahmed Bougacha	2014-12-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The X86AsmParser intel handling was refactored in r216481, making it try each different memory operand size to see which one matches. Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which led to an "invalid operand" error for code such as: movdqa [rax], xmm0 llvm-svn: 223187
*	[MCJIT] Unique-ptrify the RTDyldMemoryManager member of MCJIT. NFC.	Lang Hames	2014-12-03	4	-12/+31
\| \| \| \|	llvm-svn: 223183
*	[PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets	Hal Finkel	2014-12-03	1	-5/+3
\| \| \| \| \| \| \|	We need to use the custom expansion of readcyclecounter on all 32-bit targets (even those with 64-bit registers). This should fix the ppc64 buildbot. llvm-svn: 223182
*	AArch64: strengthen Darwin ABI alignment assumptions	Tim Northover	2014-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	A global variable without an explicit alignment specified should be assumed to be ABI-aligned according to its type, like on other platforms. This allows us to use better memory operations when accessing it. rdar://18533701 llvm-svn: 223180
*	AArch64: don't be too greedy when folding :lo12: accesses into mem ops.	Tim Northover	2014-12-02	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This frequently leads to cases like: ldr xD, [xN, :lo12:var] add xA, xN, :lo12:var ldr xD, [xA, #8] where the ADD would have been needed anyway, and the two distinct addressing modes can prevent the formation of an ldp. Because of how we handle ADRP (aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also results in duplicated ADRP instructions (one on its own to cover the ldr, and one combined with the add). llvm-svn: 223172
*	PR21302. Vectorize only bottom-tested loops.	Michael Zolotukhin	2014-12-02	1	-0/+9
\| \| \| \| \| \|	rdar://problem/18886083 llvm-svn: 223171
*	[X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets	Simon Pilgrim	2014-12-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead. The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch. Differential Revision: http://reviews.llvm.org/D6458 llvm-svn: 223165
*	[PowerPC] Implement readcyclecounter for PPC32	Hal Finkel	2014-12-02	5	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161
*	R600/SI: Emit amd_kernel_code_t header for AMDGPU environment	Tom Stellard	2014-12-02	5	-1/+829
\| \| \| \|	llvm-svn: 223160
*	[AArch64][Stackmaps] Optimize stackmap shadows on AArch64.	Lang Hames	2014-12-02	1	-1/+16
\| \| \| \| \| \| \| \| \| \|	Reduce the number of nops emitted for stackmap shadows on AArch64 by counting non-stackmap instructions up to the next branch target towards the requested shadow. <rdar://problem/14959522> llvm-svn: 223156
*	R600/SI: Move more information into SIProgramInfo struct	Tom Stellard	2014-12-02	3	-50/+80
\| \| \| \|	llvm-svn: 223154
*	Restructure some assertion checking based on post commit feedback by Aaron ↵	Philip Reames	2014-12-02	1	-7/+7
\| \| \| \| \| \|	and Tom. llvm-svn: 223150
*	[mips] Fix passing of small structures for big-endian O32.	Daniel Sanders	2014-12-02	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Like N32/N64, they must be passed in the upper bits of the register. The new code could be merged with the existing if-statements but I've refrained from doing this since it will make porting the O32 implementation to tablegen harder later. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6463 llvm-svn: 223148
*	Introduce CPUStringIsValid() into MCSubtargetInfo and use it for ARM .cpu ↵	Roman Divacky	2014-12-02	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	parsing. Previously .cpu directive in ARM assembler didnt switch to the new CPU and therefore acted as a nop. This implemented real action for .cpu and eg. allows to assembler FreeBSD kernel with -integrated-as. llvm-svn: 223147
*	R600/SI: Refactor AMDGPUAsmPrinter::EmitProgramInfoSI()	Tom Stellard	2014-12-02	1	-9/+11
\| \| \| \|	llvm-svn: 223144
*	Appease a build bot complaining about an unused variable that's used in an ↵	Philip Reames	2014-12-02	1	-0/+1
\| \| \| \| \| \|	assertion. llvm-svn: 223142
*	[Statepoints 3/4] Statepoint infrastructure for garbage collection: ↵	Philip Reames	2014-12-02	9	-0/+886
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAGBuilder This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them. With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now. I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it. During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases. In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure. Reviewed by: atrick, ributzka llvm-svn: 223137
*	[SwitchLowering] Handle destinations on multiple phi instructions	Bruno Cardoso Lopes	2014-12-02	1	-2/+3
\| \| \| \| \| \| \| \| \|	Follow up from r222926. Also handle multiple destinations from merged cases on multiple and subsequent phi instructions. rdar://problem/19106978 llvm-svn: 223135
*	[MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction.	Ahmed Bougacha	2014-12-02	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Go through implicit defs of CSMI and MI, and clear the kill flags on their uses in all the instructions between CSMI and MI. We might have made some of the kill flags redundant, consider: subs ... %NZCV<imp-def> <- CSMI csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore subs ... %NZCV<imp-def> <- MI, to be eliminated csinc ... %NZCV<imp-use,kill> Since we eliminated MI, and reused a register imp-def'd by CSMI (here %NZCV), that register, if it was killed before MI, should have that kill flag removed, because it's lifetime was extended. Also, add an exhaustive testcase for the motivating example. Reviewed by: Juergen Ributzka <juergen@apple.com> llvm-svn: 223133
*	Remove unneccessary code introduced with 223101.	Philip Reames	2014-12-02	1	-10/+2
\| \| \| \|	llvm-svn: 223132
*	R600/SI: Set correct number of user sgprs for HSA runtime	Tom Stellard	2014-12-02	1	-1/+4
\| \| \| \| \| \|	We don't support scratch buffers yet with HSA. llvm-svn: 223130
*	fix typo in comment	Sanjay Patel	2014-12-02	1	-1/+1
\| \| \| \|	llvm-svn: 223127
*	AArch64: make register block rules apply to vector types too.	Tim Northover	2014-12-02	1	-3/+3
\| \| \| \| \| \| \| \|	The blocking code originated in ARM, which is more aggressive about casting types to a canonical representative before doing anything else, so I missed out most vector HFAs and broke the ABI. This should fix it. llvm-svn: 223126
*	R600/SI: Set the ATC bit on all resource descriptors for the HSA runtime	Tom Stellard	2014-12-02	6	-9/+33
\| \| \| \|	llvm-svn: 223125
*	Triple: Add AMDHSA operating system type	Tom Stellard	2014-12-02	1	-0/+2
\| \| \| \| \| \| \| \|	This operating system type represents the AMD HSA runtime, and will be required by the R600 backend in order to generate correct code for this runtime. llvm-svn: 223124
*	[LICM] Avoind store sinking if no preheader is available	Bruno Cardoso Lopes	2014-12-02	1	-2/+4
\| \| \| \| \| \| \| \| \|	Load instructions are inserted into loop preheaders when sinking stores and later removed if not used by the SSA updater. Avoid sinking if the loop has no preheader and avoid crashes. This fixes one more side effect of not handling indirectbr instructions properly on LoopSimplify. llvm-svn: 223119
*	Remove unused function.	Asiri Rathnayake	2014-12-02	2	-12/+0
\| \| \| \| \| \| \| \| \|	Removing an unused function which is causing one of the build bots to fail. This was introduced in the commit r223113. A proper cleanup of the so_imm tblgen defintion (made redundant by the mod_imm definition) needs to happen soon. llvm-svn: 223115
*	Add support for ARM modified-immediate assembly syntax.	Asiri Rathnayake	2014-12-02	6	-35/+345
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Certain ARM instructions accept 32-bit immediate operands encoded as a 8-bit integer value (0-255) and a 4-bit rotation (0-30, even). Current ARM assembly syntax support in LLVM allows the decoded (32-bit) immediate to be specified as a single immediate operand for such instructions: mov r0, #4278190080 The ARMARM defines an extended assembly syntax allowing the encoding to be made more explicit, as in: mov r0, #255, #8 ; (same 32-bit value as above) The behaviour of the two instructions can be different w.r.t flags, which is documented under "Modified immediate constants" in ARMARM. This patch enables support for this extended syntax at the MC layer. llvm-svn: 223113
*	Add ARM relocations to ELFYAML	Will Newton	2014-12-02	1	-0/+3
\| \| \| \| \| \|	Tested with check-all with no regressions. llvm-svn: 223112
*	Emit Tag_ABI_FP_denormal correctly in fast-math mode.	Charlie Turner	2014-12-02	1	-1/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default ARM floating-point mode does not support IEEE 754 mode exactly. Of relevance to this patch is that input denormals are flushed to zero. The way in which they're flushed to zero depends on the architecture, * For VFPv2, it is implementation defined as to whether the sign of zero is preserved. * For VFPv3 and above, the sign of zero is always preserved when a denormal is flushed to zero. When FP support has been disabled, the strategy taken by this patch is to assume the software support will mirror the behaviour of the hardware support for the target if it existed. That is, for architectures which can only have VFPv2, it is assumed the software will flush to positive zero. For later architectures it is assumed the software will flush to zero preserving sign. Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd llvm-svn: 223110
*	Fix variable used only in assertion.	Nick Lewycky	2014-12-02	1	-1/+2
\| \| \| \|	llvm-svn: 223101
*	Fix several bugs in r221220's new program finding code.	Chandler Carruth	2014-12-02	2	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In both the Unix and Windows variants, std::getenv was called and the result passed directly to a function accepting a StringRef. This isn't OK because it might return a null pointer and that causes the StringRef constructor to assert (and generally produces crash-prone code if asserts are disabled). Fix this by independently testing the result as non-null prior to splitting things. This in turn uncovered another bug in the Unix variant where it would infinitely recurse if PATH="", or after this fix if PATH isn't set. There is no need to recurse at all. Slightly re-arrange the code to make it clear that we can just fixup the Paths argument based on the environment if we find anything. I don't know of a particularly useful way to test these routines in LLVM. I'll commit a test to Clang that ensures that its driver correctly handles various settings of PATH. However, I have no idea how to correctly write a Windows test for the PATHEXT change. Any Windows developers who could provide such a test, please have at. =D Many thanks to Nick Lewycky and others for helping debug this. =/ It was quite nasty for us to track down. llvm-svn: 223099
*	Simplify pointer comparisons involving memory allocation functions	Hal Finkel	2014-12-01	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	System memory allocation functions, which are identified at the IR level by the noalias attribute on the return value, must return a pointer into a memory region disjoint from any other memory accessible to the caller. We can use this property to simplify pointer comparisons between allocated memory and local stack addresses and the addresses of global variables. Neither the stack nor global variables can overlap with the region used by the memory allocator. Fixes PR21556. llvm-svn: 223093
*	Try to fix a bot failure due to a variable used only in an assert.	Philip Reames	2014-12-01	1	-4/+4
\| \| \| \| \| \|	Specifically, bot lld-x86_64-darwin13. Resulting from change 223085. llvm-svn: 223092
*	[Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & ↵	Philip Reames	2014-12-01	9	-3/+187
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	x86-64 Backend This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683. The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.) The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point. Reviewed by: atrick, ributzka llvm-svn: 223085
*	[Statepoints 1/4] Statepoint infrastructure for garbage collection: IR ↵	Philip Reames	2014-12-01	2	-0/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intrinsics The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations. A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch series) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly. This is the first patch in a small series. This patch contains only the IR parts; the documentation and backend support will be following separately. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683. Reviewed by: atrick, ributzka llvm-svn: 223078
*	[NVPTX] Do not emit .weak symbols for NVPTX	Jingyue Wu	2014-12-01	3	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the weak directive in MCAsmPrinter customizable, and disables emitting ".weak" symbols for NVPTX. Test Plan: weak-linkage.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: majnemer, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6455 llvm-svn: 223077
*	Parse 'ghccc' in .ll files as the GHC convention (cc 10)	Reid Kleckner	2014-12-01	4	-1/+6
\| \| \| \| \| \| \|	Previously we just used "cc 10" in the .ll files, but that isn't very human readable. llvm-svn: 223076
*	[AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR".	Ahmed Bougacha	2014-12-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	r208210 introduced an optimization that improves the vector select codegen by doing the setcc on vectors directly. This is a problem they the setcc operands are i1s, because the optimization would create vectors of i1, which aren't legal. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6308 llvm-svn: 223075
*	[AArch64] Fix v2i8->i16 bitcast legalization.	Ahmed Bougacha	2014-12-01	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r213378 improved f16 bitcasts, so that they go directly through subregs, instead of through the stack. That code now causes an assertion failure for bitcasts from other 16-bits types (most importantly v2i8). Correct that by doing the custom lowering for i16 bitcasts only when the input is an f16. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6307 llvm-svn: 223074
*	Use a continue to reduce indentation and clang-format. NFC.	Rafael Espindola	2014-12-01	1	-21/+24
\| \| \| \|	llvm-svn: 223067
*	Use a range loop. NFC.	Rafael Espindola	2014-12-01	1	-3/+3
\| \| \| \|	llvm-svn: 223066
*	[MachineVerifier] Accept a MBB with a single landing pad successor.	Ahmed Bougacha	2014-12-01	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MachineVerifier used to check that there was always exactly one unconditional branch to a non-landingpad (normal) successor. If that normal successor to an invoke BB is unreachable, it seems reasonable to only have one successor, the landing pad. On targets other than AArch64 (and on AArch64 with a different testcase), the branch folder turns the branch to the landing pad into a fallthrough. The MachineVerifier, which relies on AnalyzeBranch, is unable to check the condition, and doesn't complain. However, it does in this specific testcase, where the branch to the landing pad remained. Make the MachineVerifier accept it. llvm-svn: 223059