bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Move everything depending on Object/MachOFormat.h over to Support/MachO.h.	Charles Davis	2013-08-27	2	-46/+50
\| \| \| \|	llvm-svn: 189315
*	Dummy code to silence warning from 4189266	Bill Schmidt	2013-08-26	2	-0/+11
\| \| \| \|	llvm-svn: 189272
*	[PowerPC] More fast-isel chunks (returns and integer extends)	Bill Schmidt	2013-08-26	3	-3/+255
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Incremental improvement to fast-isel for PPC64. This allows us to select on ret, sext, and zext. Filling in sext/zext improves some of the existing logic in handling compare-immediates that needed extends. A simplified return convention for fast-isel is also added to the PPC64 calling conventions. All call/return processing for DAG selection is handled with custom code, so there isn't an existing CC to rely on here. The include of PPCGenCallingConv.inc causes compiler warnings due to the 32-bit calling conventions that are not used, so the dummy function "usePPC32CCs()" is added here to silence those. Test cases for the return and extend logic are added. llvm-svn: 189266
*	[PowerPC] Add fast-isel branch and compare selection.	Bill Schmidt	2013-08-25	1	-9/+272
\| \| \| \| \| \| \| \| \| \| \|	First chunk of actual fast-isel selection code. This handles direct and indirect branches, as well as feeding compares for direct branches. PPCFastISel::PPCEmitIntExt() is just roughed in and will be expanded in a future patch. This also corrects a problem with selection for constant pool entries in JIT mode or with small code model. llvm-svn: 189202
*	[PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes.	Bill Schmidt	2013-08-20	1	-271/+194
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Patch committed on behalf of Mark Minich, whose log entry follows.) This is a continuation of the refactorings performed in svn rev 188573 (see that rev's comments for more detail). This is my stage 2 refactoring: I combined the emitPrologue() & emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a lot of the code since in essence the PPC32 & PPC64 code generation logic is the same, only the instruction forms are different (in most cases). This simplification is necessary because my functional changes (yet to come) add significant complexity, and without the simplification of my stage 2 refactoring, the overall complexity of both emitPrologue() & emitEpilogue() would have become almost intractable for most mortal programmers (like me). This submission was intended to be a pure refactoring (no functional changes whatsoever). However, in the process of combining the PPC32 & PPC64 flows, I spotted a difference that I believe is a bug (see svn rev 186478 line 863, or svn rev 188573 line 888): This line appears to be restoring the BP with the original FP content, not the original BP content. When I merged the 32-bit and 64-bit code, I used the corresponding code from the 64-bit flow, which I believe uses the correct offset (BPOffset) for this operation. llvm-svn: 188741
*	Add a llvm.copysign intrinsic	Hal Finkel	2013-08-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728
*	Don't form PPC CTR-based loops around a copysignl call	Hal Finkel	2013-08-19	1	-1/+2
\| \| \| \| \| \| \| \| \|	copysign/copysignf never become function calls (because the SDAG expansion code does not lower to the corresponding function call, but rather directly implements the associated logic), but copysignl almost always is lowered into a call to the requested libm functon (and, thus, might clobber CTR). llvm-svn: 188727
*	Add the PPC fcpsgn instruction	Hal Finkel	2013-08-19	5	-7/+45
\| \| \| \| \| \| \| \| \|	Modern PPC cores support a floating-point copysign instruction, and we can use this to lower the FCOPYSIGN node (which is created from calls to the libm copysign function). A couple of extra patterns are necessary because the operand types of FCOPYSIGN need not agree. llvm-svn: 188653
*	[PowerPC] Preparatory refactoring for making prologue and epilogue	Bill Schmidt	2013-08-16	1	-85/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	safe on PPC32 SVR4 ABI [Patch and following text by Mark Minich; committing on his behalf.] There are FIXME's in PowerPC/PPCFrameLowering.cpp, method PPCFrameLowering::emitPrologue() related to "negative offsets of R1" on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4 (and any ABI without a Red Zone), no spills may be made until after the stackframe is claimed, which also includes the LR spill which is at a positive offset. The same problem exists in emitEpilogue(), though there's no FIXME for it. I intend to fix this issue, making LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit platforms (including in particular, OS-free embedded systems & kernel code, where interrupts may share the same stack as user code). In preparation for making these changes, to make the diffs for the functional changes less cluttered, I am providing the non-functional refactorings in two stages: Stage 1 does some minor fluffy refactorings to pull multiple method calls up into a single bool, creating named bools for repeated uses of obscure logic, moving some code up earlier because either stage 2 or my final version will require it earlier, and rewording/adding some comments. My stage 1 changes can be characterized as primarily fluffy cleanup, the purpose of which may be unclear until the stage 2 or final changes are made. My stage 2 refactorings combine the separate PPC32 & PPC64 logic, which is currently performed by largely duplicate code, into a single flow, with the differences handled by a group of constants initialized early in the methods. This submission is for my stage 1 changes. There should be no functional changes whatsoever; this is a pure refactoring. llvm-svn: 188573
*	Replace getValueType().getSimpleVT() with getSimpleValueType(). Also remove ↵	Craig Topper	2013-08-15	1	-4/+4
\| \| \| \| \| \|	one weird cast from MVT->EVT just to call getSimpleVT(). llvm-svn: 188441
*	Actually fix PPC64 64-bit GPR inline asm constraint matching	Hal Finkel	2013-08-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This is a follow-up to r187693, correcting that code to request the correct register class. The previous version, with the wrong register class, was not really correcting the constraints, but rather was removing them. Coincidentally, this fixed the failing test case in r187693, but obviously created other problems. llvm-svn: 188407
*	cast fix to appease buildbot	David Fang	2013-08-08	1	-1/+1
\| \| \| \|	llvm-svn: 188014
*	initial draft of PPCMachObjectWriter.cpp	David Fang	2013-08-08	4	-19/+395
\| \| \| \| \| \| \| \|	this records relocation entries in the mach-o object file for PIC code generation. tested on powerpc-darwin8, validated against darwin otool -rvV llvm-svn: 188004
*	PPC: Map frin to round() not nearbyint() and rint()	Hal Finkel	2013-08-08	2	-68/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Making use of the recently-added ISD::FROUND, which allows for custom lowering of round(), the PPC backend will now map frin to round(). Previously, we had been using frin to lower nearbyint() (and rint() via some custom lowering to handle the extra fenv flags requirements), but only in fast-math mode because frin does not tie-to-even. Several users had complained about this behavior, and this new mapping of frin to round is certainly more appropriate (and does not require fast-math mode). In effect, this reverts r178362 (and part of r178337, replacing the nearbyint mapping with the round mapping). llvm-svn: 187960
*	Add ISD::FROUND for libm round()	Hal Finkel	2013-08-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926
*	Add PPC64 mulli pattern	Hal Finkel	2013-08-06	1	-0/+3
\| \| \| \| \| \| \| \|	The PPC backend had been missing a pattern to generate mulli for 64-bit multiples. We had been generating it only for 32-bit multiplies. Unfortunately, generating li + mulld unnecessarily increases register pressure. llvm-svn: 187807
*	Target/*/CMakeLists.txt: Add the dependency to CommonTableGen explicitly for ↵	NAKAMURA Takumi	2013-08-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	each corresponding CodeGen. Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel. It races to emit *.inc files simultaneously. llvm-svn: 187780
*	PPCAsmParser: Stop leaking names.	Benjamin Kramer	2013-08-03	1	-10/+31
\| \| \| \| \| \|	Store them in a place that gets cleaned up properly. llvm-svn: 187700
*	Fix PPC64 64-bit GPR inline asm constraint matching	Hal Finkel	2013-08-03	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the 64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with explicit register names, on PPC64 when an i64 MVT has been requested, we need to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent) registers. At some point, we'll probably want to arrange things so that the generic code in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order to match these inline asm register constraints. If we do that, this change can be reverted. llvm-svn: 187693
*	Use function attributes to indicate that we don't want to realign the stack.	Bill Wendling	2013-08-01	1	-1/+1
\| \| \| \| \| \| \| \|	Function attributes are the future! So just query whether we want to realign the stack directly from the function instead of through a random target options structure. llvm-svn: 187618
*	[PowerPC] Skeletal FastISel support for 64-bit PowerPC ELF.	Bill Schmidt	2013-07-30	5	-1/+347
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first of many upcoming patches for PowerPC fast instruction selection support. This patch implements the minimum necessary for a functional (but extremely limited) FastISel pass. It allows the table-generated portions of the selector to be created and used, but in most cases selection will fall back to the DAG selector. None of the block terminator instructions are implemented yet, and most interesting instructions require some special handling. Therefore there aren't any new test cases with this patch. There will be quite a few tests coming with future patches. This patch adds the make/CMake support for the new code (including tablegen -gen-fast-isel) and creates the FastISel object for PPC64 ELF only. It instantiates the necessary virtual functions (TargetSelectInstruction, TargetMaterializeConstant, TargetMaterializeAlloca, tryToFoldLoadIntoMI, and FastLowerArguments), but of these, only TargetMaterializeConstant contains any useful implementation. This is present since the table-generated code requires the ability to materialize integer constants for some instructions. This patch has been tested by building and running the projects/test-suite code with -O0. All tests passed with the exception of a couple of long-running tests that time out using -O0 code generation. llvm-svn: 187399
*	[PowerPC] Add comment explaining preprocessor directive.	Bill Schmidt	2013-07-28	1	-0/+2
\| \| \| \|	llvm-svn: 187320
*	Revert 187318	Bill Schmidt	2013-07-28	1	-1/+1
\| \| \| \|	llvm-svn: 187319
*	[PowerPC] Remove unnecessary preprocessor checking.	Bill Schmidt	2013-07-28	1	-1/+1
\| \| \| \| \| \| \| \| \|	The tests !defined(__ppc__) && !defined(__powerpc__) are not needed or helpful when verifying that code is being compiled for a 64-bit target. The simpler test provided by this revision is sufficient to tell if the target is 64-bit. llvm-svn: 187318
*	Revert "[PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc."	Rafael Espindola	2013-07-26	1	-3/+3
\| \| \| \| \| \|	This reverts commit r187248. It broke many bots. llvm-svn: 187254
*	[PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc.	Bill Schmidt	2013-07-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both GCC and LLVM will implicitly define __ppc__ and __powerpc__ for all PowerPC targets, whether 32- or 64-bit. They will both implicitly define __ppc64__ and __powerpc64__ for 64-bit PowerPC targets, and not for 32-bit targets. We cannot be sure that all other possible compilers used to compile Clang/LLVM define both __ppc__ and __powerpc__, for example, so it is best to check for both when relying on either inside the Clang/LLVM code base. This patch makes sure we always check for both variants. In addition, it fixes one unnecessary check in lib/Target/PowerPC/PPCJITInfo.cpp. (At least one of __ppc__ and __powerpc__ should always be defined when compiling for a PowerPC target, no matter which compiler is used, so testing for them is unnecessary.) There are some places in the compiler that check for other variants, like __POWERPC__ and _POWER, and I have left those in place. There is no need to add them elsewhere. This seems to be in Apple-specific code, and I won't take a chance on breaking it. There is no intended change in behavior; thus, no test cases are added. llvm-svn: 187248
*	[PowerPC] Support powerpc64le as a syntax-checking target.	Bill Schmidt	2013-07-26	10	-7/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch provides basic support for powerpc64le as an LLVM target. However, use of this target will not actually generate little-endian code. Instead, use of the target will cause the correct little-endian built-in defines to be generated, so that code that tests for __LITTLE_ENDIAN__, for example, will be correctly parsed for syntax-only testing. Code generation will otherwise be the same as powerpc64 (big-endian), for now. The patch leaves open the possibility of creating a little-endian PowerPC64 back end, but there is no immediate intent to create such a thing. The LLVM portions of this patch simply add ppc64le coverage everywhere that ppc64 coverage currently exists. There is nothing of any import worth testing until such time as little-endian code generation is implemented. In the corresponding Clang patch, there is a new test case variant to ensure that correct built-in defines for little-endian code are generated. llvm-svn: 187179
*	PPC32 va_list is an actual structure so va_copy needs to copy the whole	Roman Divacky	2013-07-25	2	-1/+23
\| \| \| \| \| \| \|	structure not just a pointer. This implements that and thus fixes va_copy on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz! llvm-svn: 187158
*	allow tests to run on powerpc-darwin8 again, checking for __ppc__	David Fang	2013-07-24	1	-2/+2
\| \| \| \|	llvm-svn: 187027
*	Split generated asm mnemonic matching table into a separate table for each ↵	Craig Topper	2013-07-24	1	-0/+1
\| \| \| \| \| \| \| \|	asm variant. This removes the need to store the asm variant in each row of the single table that existed before. Shaves ~16K off the size of X86AsmParser.o. llvm-svn: 187026
*	PPC: Support dynamic allocas with large alignment	Hal Finkel	2013-07-18	2	-27/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for dynamic stack alignments in the PPC backend has been unfinished, in part because it depends on dynamic stack realignment (which I only just recently implemented fully). Now we can also support dynamic allocas with higher than the default target stack alignment (16 bytes). In order to round-up the requested size to the maximum requested alignment, we need an additional register to hold the rounded-up size. We're already using one scavenged register to hold the previous stack-pointer value (which needs to be stored with the signal-safe stdux update), and so when we have dynamic allocas and a large alignment, we allocate two emergency spill slots for the scavenger. llvm-svn: 186562
*	PPC: Add base-pointer support to builtin setjmp/longjmp	Hal Finkel	2013-07-17	5	-41/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First, this changes the base-pointer implementation to remove an unnecessary complication (and one that is incompatible with how builtin SjLj is implemented): instead of using r31 as the base pointer when it is not needed as a frame pointer, now the base pointer will always be r30 when needed. Second, we introduce another pseudo register, BP, which is used just like the FP pseudo register to refer to the base register before we know for certain what register it will be. Third, we now save BP into the jmp_buf, and restore r30 from that slot in longjmp. If the function that called setjmp did not use a base pointer, then r30 will be overwritten by the setjmp-calling-function's restore code. FP restoration (which is restored into r31) works the same way. llvm-svn: 186545
*	PPC: Add CTR-register clobber to builtin setjmp	Hal Finkel	2013-07-17	3	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Because the builtin longjmp implementation uses a CTR-based indirect jump, when the control flow arrives at the builtin setjmp call, the CTR register has necessarily been clobbered. Correspondingly, this adds CTR to the list of implicit definitions of the builtin setjmp pseudo instruction. We don't need to add CTR to the implicit definitions of builtin longjmp because, even though it does clobber the CTR register, the control flow cannot return to inside the loop unless there is also a builtin setjmp call. llvm-svn: 186488
*	PPC: Implement base pointer and stack realignment	Hal Finkel	2013-07-17	5	-41/+240
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This builds on some frame-lowering code that has existed since 2005 (r24224) but was disabled in 2008 (r48188) because it needed base pointer support to function correctly. This implementation follows the strategy suggested by Dale Johannesen in r48188 where the following comment was added: This does not currently work, because the delta between old and new stack pointers is added to offsets that reference incoming parameters after the prolog is generated, and the code that does that doesn't handle a variable delta. You don't want to do that anyway; a better approach is to reserve another register that retains to the incoming stack pointer, and reference parameters relative to that. And now we do exactly that. If we don't need a frame pointer, then we use r31 as a base pointer. If we do need a frame pointer, then we use r30 as a base pointer. The base pointer retains the value of the stack pointer before it was decremented in the prologue. We then use the base pointer to resolve all negative frame indicies. The basic scheme follows that for base pointers in the X86 backend. We use a base pointer when we need to dynamically realign the incoming stack pointer. This currently applies only to static objects (dynamic allocas with large alignments, and base-pointer support in SjLj lowering will come in future commits). llvm-svn: 186478
*	PPCJITInfo.cpp: Tweak r186252 with s/__ppc/__powerpc/ to work on ↵	NAKAMURA Takumi	2013-07-16	1	-2/+2
\| \| \| \| \| \| \| \|	powerpc-linux Fedora 12. g++ (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10) llvm-svn: 186396
*	PPC: Refactoring to support subtarget feature changing	Hal Finkel	2013-07-15	2	-37/+69
\| \| \| \| \| \| \| \| \|	This change mirrors the changes that were made to the X86 and ARM targets to support subtarget feature changing. As indicated in r182899, the mechanism is still undergoing revision, and so as with the X86 and ARM targets, there is no test case yet (there is no effective functionality change). llvm-svn: 186357
*	Fix register subclass handling in PPCInstrInfo::insertSelect	Hal Finkel	2013-07-15	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the common subclass of the true and false inputs, and then selecting either the 32-bit or the 64-bit isel variant based on the result of calling PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC) (where RC is the common subclass). Unfortunately, this is not quite right: if we have something like this: %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76; G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6 then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8 pseudo-register). As a result, we also need to check the common subclass against GPRC_NOR0 and G8RC_NOX0 explicitly. This had not been a problem for clients of insertSelect that called canInsertSelect first (because it had a compensating mistake), but insertSelect is also used by the PPC pseudo-instruction expander, and this error was causing a problem in that context. This problem was found by csmith. llvm-svn: 186343
*	Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).	Craig Topper	2013-07-15	1	-1/+1
\| \| \| \|	llvm-svn: 186301
*	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector ↵	Craig Topper	2013-07-14	3	-15/+15
\| \| \| \| \| \|	size. llvm-svn: 186274
*	Reduce large list of macros to the primary platform macros. Distingiush	Joerg Sonnenberger	2013-07-13	1	-20/+18
\| \| \| \| \| \| \|	between ELF (Linux, FreeBSD, NetBSD) and OSX as platform for the assembler dialect. llvm-svn: 186252
*	PPC: Add some missing V_SET0 patterns	Hal Finkel	2013-07-11	1	-2/+15
\| \| \| \| \| \| \| \| \| \|	We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for v8i16 (which occurs in the test case) or v16i8. The same was true for V_SETALLONES (so I added the associated patterns for those as well). Another bug found by llvm-stress. llvm-svn: 186108
*	PPCDAGToDAGISel::isRunOfOnes should return false on zero	Hal Finkel	2013-07-11	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI with an out-of-range operand. Most uses of the isRunOfOnes function are guarded by a condition that the value is not zero. This was not true in two places, and in both places a zero input would result in an out-of-rage MB value (= 32). To fix this, isRunOfOnes returns false on a zero input (and I've remove one now-redundant guard). llvm-svn: 186101
*	PPC: Add a better comment about the i64 FI fixup	Hal Finkel	2013-07-10	1	-2/+13
\| \| \| \| \| \| \| \|	In discussing this change with Bill Schmidt, it was decided that the original comment about negative FIs was incorrect. We'll still exclude them for now, but now with a more-accurate explanation. llvm-svn: 186005
*	[PowerPC] Better fix for PR16556.	Bill Schmidt	2013-07-09	1	-9/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A more complete example of the bug in PR16556 was recently provided, showing that the previous fix was not sufficient. The previous fix is reverted herein. The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as custom lowering for FP_TO_SINT during type legalization, without checking whether the input type is handled by that routine. LowerFP_TO_INT requires the input to be f32 or f64, so we fail when the input is ppcf128. I'm leaving the test case from the initial fix (r185821) in place, and adding the new test as another crash-only check. llvm-svn: 185959
*	AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all	Stephen Lin	2013-07-09	2	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956
*	[PowerPC] Revert r185476 and fix up TLS variant kinds	Ulrich Weigand	2013-07-09	3	-4/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the commit message to r185476 I wrote: >The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD >correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. >This causes some confusion with the asm parser, since VK_PPC_TLSGD >is output as @tlsgd, which is then read back in as VK_TLSGD. > >To avoid this confusion, this patch removes the PowerPC-specific >modifiers and uses the generic modifiers throughout. (The only >drawback is that the generic modifiers are printed in upper case >while the usual convention on PowerPC is to use lower-case modifiers. >But this is just a cosmetic issue.) This was unfortunately incorrect, there is is fact another, serious drawback to using the default VK_TLSLD/VK_TLSGD variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT to return true, which in turn causes the ELFObjectWriter to emit an undefined reference to _GLOBAL_OFFSET_TABLE_. This is a problem on powerpc64, because it uses the TOC instead of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_, so the symbol remains undefined. This means shared libraries using TLS built with the integrated assembler are currently broken. While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation probably ought to be properly fixed at some point, for now I'm simply reverting the r185476 commit. Now this in turn exposes the breakage of handling @tlsgd/@tlsld in the asm parser that this check-in was originally intended to fix. To avoid this regression, I'm also adding a different fix for this problem: while common code now parses @tlsgd as VK_TLSGD, a special hack in the asm parser translates this code to the platform-specific VK_PPC_TLSGD that the back-end now expects. While this is not really pretty, it's self-contained and shouldn't hurt anything else for now. One the underlying problem is fixed, this hack can be reverted again. llvm-svn: 185945
*	[PowerPC] Support ".machine any"	Ulrich Weigand	2013-07-09	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \|	The PowerPC assembler is supposed to provide a directive .machine that allows switching the supported CPU instruction set on the fly. Since we do not yet check CPU feature sets at all and always accept any available instruction, this is not really useful at this point. However, it makes sense to accept (and ignore) ".machine any" to avoid spuriously rejecting existing assembler files that use this. llvm-svn: 185924
*	[PowerPC] Support .llong and fix .word	Ulrich Weigand	2013-07-09	1	-1/+3
\| \| \| \| \| \| \| \|	This adds support for the .llong PowerPC-specifc assembler directive. In doing so, I notices that .word is currently incorrect: it is supposed to define a 2-byte data element, not a 4-byte one. llvm-svn: 185911
*	PPC: Allocate RS spill slot for unaligned i64 load/store	Hal Finkel	2013-07-09	1	-2/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes another bug found by llvm-stress! If we happen to be doing an i64 load or store into a stack slot that has less than a 4-byte alignment, then the frame-index elimination may need to use an indexed load or store instruction (because the offset may not be a multiple of 4, a requirement of the STD/LD instructions). The extra register needed to hold the offset comes from the register scavenger, and it is possible that the scavenger will need to use an emergency spill slot. As a result, we need to make sure that a spill slot is allocated when doing an i64 load/store into a less-than-4-byte-aligned stack slot. Because test cases for things like this tend to be fairly fragile, I've concatenated a few small bugpoint-reduced test cases together to form the regression test. llvm-svn: 185907
*	[PowerPC] Always use "assembler dialect" 1	Ulrich Weigand	2013-07-08	9	-42/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A setting in MCAsmInfo defines the "assembler dialect" to use. This is used by common code to choose between alternatives in a multi-alternative GNU inline asm statement like the following: __asm__ ("{sfe\|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2)); The meaning of these dialects is platform specific, and GCC defines those for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for new-style (PowerPC) mnemonics, like in the example above. To be compatible with inline asm used with GCC, LLVM ought to do the same. Specifically, this means we should always use assembler dialect 1 since old-style mnemonics really aren't supported on any current platform. However, the current LLVM back-end uses: AssemblerDialect = 1; // New-Style mnemonics. in PPCMCAsmInfoDarwin, and AssemblerDialect = 0; // Old-Style mnemonics. in PPCLinuxMCAsmInfo. The Linux setting really isn't correct, we should be using new-style mnemonics everywhere. This is changed by this commit. Unfortunately, the setting of this variable is overloaded in the back-end to decide whether or not we are on a Darwin target. This is done in PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect to 1 for both Darwin and Linux no longer allows us to make this distinction. Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter to distinguish Darwin targets, and ignores the SyntaxVariant parameter. As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs to be passed in by the caller when creating a target MCExpr. (To do so this patch implicitly also reverts commit 184441.) llvm-svn: 185858