bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	memcmp is not a valid way to compare structs with padding in them.	Benjamin Kramer	2013-08-20	1	-2/+9
\| \| \| \|	llvm-svn: 188778
*	[mips][msa] Added insve	Daniel Sanders	2013-08-20	1	-0/+32
\| \| \| \|	llvm-svn: 188777
*	Fix overly pessimistic shortcut in post-RA MachineLICM	Richard Sandiford	2013-08-20	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers and TermRegs. When it sees a definition of R it adds all aliases of R to the corresponding set, so that when it needs to test for membership it only needs to test a single register, rather than worrying about aliases there too. E.g. the final candidate loop just has: unsigned Def = Candidates[i].Def; if (!PhysRegClobbers.test(Def) && ...) { to test whether register Def is multiply defined. However, there was also a shortcut in ProcessMI to make sure we didn't add candidates if we already knew that they would fail the final test. This shortcut was more pessimistic than the final one because it checked whether _any alias_ of the defined register was multiply defined. This is too conservative for targets that define register pairs. E.g. on z, R0 and R1 are sometimes used as a pair, so there is a 128-bit register that aliases both R0 and R1. If a loop used R0 and R1 independently, and the definition of R0 came first, we would be able to hoist the R0 assignment (because that used the final test quoted above) but not the R1 assignment (because that meant we had two definitions of the paired R0/R1 register and would fail the shortcut in ProcessMI). This patch just uses the same check for the ProcessMI shortcut as we use in the final candidate loop. llvm-svn: 188774
*	ARM: implement some simple f64 materializations.	Tim Northover	2013-08-20	1	-10/+40
\| \| \| \| \| \| \| \|	Previously we used a const-pool load for virtually all 64-bit floating values. Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov" instructions of one stripe or another. llvm-svn: 188773
*	[stackprotector] Small cleanup.	Michael Gottesman	2013-08-20	1	-1/+2
\| \| \| \|	llvm-svn: 188772
*	[stackprotector] Small Bit of computation hoisting.	Michael Gottesman	2013-08-20	1	-5/+5
\| \| \| \|	llvm-svn: 188771
*	[stackprotector] Added significantly longer comment to FindPotentialTailCall ↵	Michael Gottesman	2013-08-20	1	-1/+6
\| \| \| \| \| \|	to make clear its relationship to llvm::isInTailCallPosition. llvm-svn: 188770
*	Removed trailing whitespace.	Michael Gottesman	2013-08-20	1	-18/+18
\| \| \| \|	llvm-svn: 188769
*	[stackprotector] Removed stale TODO.	Michael Gottesman	2013-08-20	1	-6/+3
\| \| \| \|	llvm-svn: 188768
*	[mips][msa] Added and.v, bmnz.v, bmz.v, bsel.v, nor.v, or.v, xor.v	Daniel Sanders	2013-08-20	2	-0/+64
\| \| \| \|	llvm-svn: 188767
*	[stackprotector] Added support for emitting the llvm intrinsic stack ↵	Michael Gottesman	2013-08-20	1	-49/+150
\| \| \| \| \| \| \| \|	protector check. rdar://13935163 llvm-svn: 188766
*	[stackprotector] Refactor out the end of isInTailCallPosition into the ↵	Michael Gottesman	2013-08-20	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	function returnTypeIsEligibleForTailCall. This allows me to use returnTypeIsEligibleForTailCall in the stack protector pass. rdar://13935163 llvm-svn: 188765
*	Remove unused variables that crept in.	Michael Gottesman	2013-08-20	1	-6/+0
\| \| \| \|	llvm-svn: 188761
*	Teach selectiondag how to handle the stackprotectorcheck intrinsic.	Michael Gottesman	2013-08-20	3	-4/+390
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, generation of stack protectors was done exclusively in the pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated splitting basic blocks at the IR level to create the success/failure basic blocks in the tail of the basic block in question. As a result of this, calls that would have qualified for the sibling call optimization were no longer eligible for optimization since said calls were no longer right in the "tail position" (i.e. the immediate predecessor of a ReturnInst instruction). Then it was noticed that since the sibling call optimization causes the callee to reuse the caller's stack, if we could delay the generation of the stack protector check until later in CodeGen after the sibling call decision was made, we get both the tail call optimization and the stack protector check! A few goals in solving this problem were: 1. Preserve the architecture independence of stack protector generation. 2. Preserve the normal IR level stack protector check for platforms like OpenBSD for which we support platform specific stack protector generation. The main problem that guided the present solution is that one can not solve this problem in an architecture independent manner at the IR level only. This is because: 1. The decision on whether or not to perform a sibling call on certain platforms (for instance i386) requires lower level information related to available registers that can not be known at the IR level. 2. Even if the previous point were not true, the decision on whether to perform a tail call is done in LowerCallTo in SelectionDAG which occurs after the Stack Protector Pass. As a result, one would need to put the relevant callinst into the stack protector check success basic block (where the return inst is placed) and then move it back later at SelectionDAG/MI time before the stack protector check if the tail call optimization failed. The MI level option was nixed immediately since it would require platform specific pattern matching. The SelectionDAG level option was nixed because SelectionDAG only processes one IR level basic block at a time implying one could not create a DAG Combine to move the callinst. To get around this problem a few things were realized: 1. While one can not handle multiple IR level basic blocks at the SelectionDAG Level, one can generate multiple machine basic blocks for one IR level basic block. This is how we handle bit tests and switches. 2. At the MI level, tail calls are represented via a special return MIInst called "tcreturn". Thus if we know the basic block in which we wish to insert the stack protector check, we get the correct behavior by always inserting the stack protector check right before the return statement. This is a "magical transformation" since no matter where the stack protector check intrinsic is, we always insert the stack protector check code at the end of the BB. Given the aforementioned constraints, the following solution was devised: 1. On platforms that do not support SelectionDAG stack protector check generation, allow for the normal IR level stack protector check generation to continue. 2. On platforms that do support SelectionDAG stack protector check generation: a. Use the IR level stack protector pass to decide if a stack protector is required/which BB we insert the stack protector check in by reusing the logic already therein. If we wish to generate a stack protector check in a basic block, we place a special IR intrinsic called llvm.stackprotectorcheck right before the BB's returninst or if there is a callinst that could potentially be sibling call optimized, before the call inst. b. Then when a BB with said intrinsic is processed, we codegen the BB normally via SelectBasicBlock. In said process, when we visit the stack protector check, we do not actually emit anything into the BB. Instead, we just initialize the stack protector descriptor class (which involves stashing information/creating the success mbbb and the failure mbb if we have not created one for this function yet) and export the guard variable that we are going to compare. c. After we finish selecting the basic block, in FinishBasicBlock if the StackProtectorDescriptor attached to the SelectionDAGBuilder is initialized, we first find a splice point in the parent basic block before the terminator and then splice the terminator of said basic block into the success basic block. Then we code-gen a new tail for the parent basic block consisting of the two loads, the comparison, and finally two branches to the success/failure basic blocks. We conclude by code-gening the failure basic block if we have not code-gened it already (all stack protector checks we generate in the same function, use the same failure basic block). llvm-svn: 188755
*	Fix formatting. No functional change.	Craig Topper	2013-08-20	1	-1/+1
\| \| \| \|	llvm-svn: 188746
*	Add AVX-512 and related features to the CPUID detection code.	Craig Topper	2013-08-20	1	-3/+19
\| \| \| \|	llvm-svn: 188745
*	Move AVX and non-AVX replication inside a couple multiclasses to avoid ↵	Craig Topper	2013-08-20	1	-87/+60
\| \| \| \| \| \|	repeating each instruction for both individually. llvm-svn: 188743
*	Add an error check for a typo I accidentally made in a td file that caused ↵	Craig Topper	2013-08-20	1	-0/+3
\| \| \| \| \| \|	an assert to fire. llvm-svn: 188742
*	[PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes.	Bill Schmidt	2013-08-20	1	-271/+194
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Patch committed on behalf of Mark Minich, whose log entry follows.) This is a continuation of the refactorings performed in svn rev 188573 (see that rev's comments for more detail). This is my stage 2 refactoring: I combined the emitPrologue() & emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a lot of the code since in essence the PPC32 & PPC64 code generation logic is the same, only the instruction forms are different (in most cases). This simplification is necessary because my functional changes (yet to come) add significant complexity, and without the simplification of my stage 2 refactoring, the overall complexity of both emitPrologue() & emitEpilogue() would have become almost intractable for most mortal programmers (like me). This submission was intended to be a pure refactoring (no functional changes whatsoever). However, in the process of combining the PPC32 & PPC64 flows, I spotted a difference that I believe is a bug (see svn rev 186478 line 863, or svn rev 188573 line 888): This line appears to be restoring the BP with the original FP content, not the original BP content. When I merged the 32-bit and 64-bit code, I used the corresponding code from the 64-bit flow, which I believe uses the correct offset (BPOffset) for this operation. llvm-svn: 188741
*	[Sparc] Use HWEncoding instead of unused Num field in Sparc register ↵	Venkatraman Govindaraju	2013-08-20	2	-12/+9
\| \| \| \| \| \|	definitions. Also, correct the definitions of RETL and RET instructions. llvm-svn: 188738
*	Add a llvm.copysign intrinsic	Hal Finkel	2013-08-19	7	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728
*	Don't form PPC CTR-based loops around a copysignl call	Hal Finkel	2013-08-19	1	-1/+2
\| \| \| \| \| \| \| \| \|	copysign/copysignf never become function calls (because the SDAG expansion code does not lower to the corresponding function call, but rather directly implements the associated logic), but copysignl almost always is lowered into a call to the requested libm functon (and, thus, might clobber CTR). llvm-svn: 188727
*	Adding PIC support for ELF on x86_64 platforms	Andrew Kaylor	2013-08-19	4	-16/+244
\| \| \| \|	llvm-svn: 188726
*	Introduce non-const overloads for GlobalAlias::{get,resolve}AliasedGlobal.	Peter Collingbourne	2013-08-19	1	-8/+8
\| \| \| \|	llvm-svn: 188725
*	Use pop_back_val() instead of both back() and pop_back().	Jakub Staszak	2013-08-19	1	-2/+1
\| \| \| \|	llvm-svn: 188723
*	Teach InstCombine visitGetElementPtr about address spaces	Matt Arsenault	2013-08-19	3	-20/+26
\| \| \| \|	llvm-svn: 188721
*	Cleanup visitGetElementPtr to make address space change easier	Matt Arsenault	2013-08-19	1	-11/+13
\| \| \| \|	llvm-svn: 188720
*	commonPointerCast cleanups to make address space change easier	Matt Arsenault	2013-08-19	1	-5/+11
\| \| \| \|	llvm-svn: 188719
*	Fix assert with GEP ptr vector indexing structs	Matt Arsenault	2013-08-19	1	-2/+12
\| \| \| \| \| \| \| \|	Also fix it calculating the wrong value. The struct index is not a ConstantInt, so it was being interpreted as an array index. llvm-svn: 188713
*	Use less verbose code and update comments.	Eric Christopher	2013-08-19	1	-23/+16
\| \| \| \|	llvm-svn: 188711
*	Revert non-test parts of r188507	Matt Arsenault	2013-08-19	1	-1/+9
\| \| \| \| \| \|	Re-add the inboundsless tests I didn't add originally llvm-svn: 188710
*	Turn on pubnames by default on linux.	Eric Christopher	2013-08-19	2	-10/+22
\| \| \| \| \| \| \| \| \|	Until gdb supports the new accelerator tables we should add the pubnames section so that gdb_index can be generated from gold at link time. On darwin we already emit the accelerator tables and so don't need to worry about pubnames. llvm-svn: 188708
*	Improve the widening of integral binary vector operations	Paul Redmond	2013-08-19	2	-10/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap - WidenVecRes_BinaryCanTrap preserves the original behaviour for operations that can trap - WidenVecRes_Binary simply widens the operation and improves codegen for 3-element vectors by allowing widening and promotion on x86 (matches the behaviour of unary and ternary operation widening) - use WidenVecRes_Binary for operations on integers. Reviewed by: nrotem llvm-svn: 188699
*	Adding comments to document RuntimeDyld relocation handling	Andrew Kaylor	2013-08-19	3	-1/+44
\| \| \| \|	llvm-svn: 188697
*	[mips] Fix instruction definitions that were incorrectly marked as ↵	Akira Hatanaka	2013-08-19	1	-6/+9
\| \| \| \| \| \|	code-gen-only. llvm-svn: 188690
*	Introduce SpecialCaseList::isIn overload for GlobalAliases.	Peter Collingbourne	2013-08-19	1	-2/+14
\| \| \| \| \| \|	Differential Revision: http://llvm-reviews.chandlerc.com/D1437 llvm-svn: 188688
*	Thumb2 add immediate alias for SP	Mihai Popa	2013-08-19	1	-1/+2
\| \| \| \| \| \| \| \|	The Thumb2 add immediate is in fact defined for SP. The manual is misleading as it points to a different section for add immediate with SP, however the encoding is the same as for add immediate with register only with the SP operand hard coded. As such add immediate with SP and add immediate with register can safely be treated as the same instruction. All the patch does is adjust a register constraint on an instruction alias. llvm-svn: 188676
*	AVX-512: added arithmetic and logical operations.	Elena Demikhovsky	2013-08-19	3	-27/+249
\| \| \| \| \| \| \|	ADD, SUB, MUL integer and FP types. OR, AND, XOR. Added embeded broadcast form for these instructions. llvm-svn: 188673
*	[SystemZ] Add negative integer absolute (load negative)	Richard Sandiford	2013-08-19	3	-6/+12
\| \| \| \| \| \| \| \|	For now this matches the equivalent of (neg (abs ...)), which did hit a few times in projects/test-suite. We should probably also match cases where absolute-like selects are used with reversed arguments. llvm-svn: 188671
*	[SystemZ] Add integer absolute (load positive)	Richard Sandiford	2013-08-19	2	-1/+19
\| \| \| \|	llvm-svn: 188670
*	[SystemZ] Add support for sibling calls	Richard Sandiford	2013-08-19	6	-21/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This first cut is pretty conservative. The final argument register (R6) is call-saved, so we would need to make sure that the R6 argument to a sibling call is the same as the R6 argument to the calling function, which seems worth keeping as a separate patch. Saying that integer truncations are free means that we no longer use the extending instructions LGF and LLGF for spills in int-conv-09.ll and int-conv-10.ll. Instead we treat the registers as 64 bits wide and truncate them to 32-bits where necessary. I think it's unlikely we'd use LGF and LLGF for spills in other situations for the same reason, so I'm removing the tests rather than replacing them. The associated code is generic and applies to many more instructions than just LGF and LLGF, so there is no corresponding code removal. llvm-svn: 188669
*	Adds missing TLI check for library simplification of	Michael Kuperstein	2013-08-19	1	-3/+6
\| \| \| \| \| \| \|	* pow(x, 0.5) -> fabs(sqrt(x)) * pow(2.0, x) -> exp2(x) llvm-svn: 188656
*	Add ExpandFloatOp_FCOPYSIGN to handle ppcf128-related expansions	Hal Finkel	2013-08-19	2	-0/+13
\| \| \| \| \| \| \| \| \| \|	We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the sum of two doubles, and the first double must have the larger magnitude, we can take the sign from the first double. As a result, in addition to fixing the crash, this is also an optimization. llvm-svn: 188655
*	Add the PPC fcpsgn instruction	Hal Finkel	2013-08-19	5	-7/+45
\| \| \| \| \| \| \| \| \|	Modern PPC cores support a floating-point copysign instruction, and we can use this to lower the FCOPYSIGN node (which is created from calls to the libm copysign function). A couple of extra patterns are necessary because the operand types of FCOPYSIGN need not agree. llvm-svn: 188653
*	llvm-dwarfdump: Do not include address offsets for attributes, only for tags	David Blaikie	2013-08-19	1	-1/+1
\| \| \| \| \| \| \| \| \|	This reduces the noise in diffs making it more likely that, at least for LLVM revision-over-revision, diffs will actually yield usable results. This is consistent with objdump's DWARF dumping behavior. llvm-svn: 188650
*	DebugInfo: don't emit zero-length names for parameters	David Blaikie	2013-08-19	1	-1/+2
\| \| \| \| \| \| \| \|	We check this in many/all other cases, just missed this one it seems. Perhaps it'd be worth unifying this so we never emit zero-length DW_AT_names. llvm-svn: 188649
*	Remove SpecialCaseList::findCategory.	Peter Collingbourne	2013-08-19	1	-35/+0
\| \| \| \| \| \|	It turned out that I didn't need this for DFSan. llvm-svn: 188646
*	ARM: make sure we keep inline asm operands tied.	Tim Northover	2013-08-18	1	-1/+4
\| \| \| \| \| \| \| \|	When patching inlineasm nodes to use GPRPair for 64-bit values, we were dropping the information that two operands were tied, which effectively broke the live-interval of vregs affected. llvm-svn: 188643
*	AVX-512: Added VMOVD, VMOVQ, VMOVSS, VMOVSD instructions.	Elena Demikhovsky	2013-08-18	6	-44/+335
\| \| \| \|	llvm-svn: 188637
*	Make more of the lowering helpers static. Also use MVT instead of EVT in a ↵	Craig Topper	2013-08-18	2	-24/+17
\| \| \| \| \| \|	couple places. llvm-svn: 188629