bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4	Ulrich Weigand	2014-07-20	1	-31/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When handling an incoming byval argument, we need to possibly write incoming registers to the stack in order to create an on-stack image of the parameter, so we can return its address to common code. This currently uses CreateFixedObject to access the parts of the parameter save area where the argument is (or needs to be) stored. However, sometimes we need to access multiple parts of that area, e.g. to write multiple registers. The code currently uses a new CreateFixedObject call for each of these accesses, resulting in a patchwork of overlapping (fixed) stack objects. This doesn't really matter in the case of fixed objects, since any access to those turns into a fixed stackpointer + offset address anyway. However, with the upcoming ELFv2 patches, we may actually need to place an incoming argument into our own stack frame instead of the caller's. This means we need to use CreateStackObject instead, and we cannot have multiple overlapping instances of those. To make the rest of the argument handling code work equally in both situations, this patch refactors it to always use just a single call to CreateFixedObject, and access parts of that object as required using address arithmetic. This way, we can in a future patch substitute CreateStackObject without further changes. No change to generated code intended. llvm-svn: 213483
*	[PowerPC] Fix FrameIndex handling in SelectAddressRegImm	Ulrich Weigand	2014-07-20	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482
*	[PowerPC] 32-bit ELF PIC support	Hal Finkel	2014-07-18	1	-12/+45
\| \| \| \| \| \| \| \| \| \|	This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend. Patch by Justin Hibbits! llvm-svn: 213427
*	[PowerPC] Implement atomic NAND operations as actual NAND	Ulrich Weigand	2014-07-08	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	This changes the implementation of atomic NAND operations from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)" (compatible with GCC >= 4.4). This is in line with the common-code and ARM back-end change implemented in r212433. llvm-svn: 212547
*	[PowerPC] Fix no-assert build	Ulrich Weigand	2014-07-07	1	-0/+1
\| \| \| \| \| \| \|	r212476 caused a compile failure (unused variable) in a non-assertion build ... llvm-svn: 212477
*	[PowerPC] Fix "byval align" arguments	Ulrich Weigand	2014-07-07	1	-67/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arguments passed as "byval align" should get the specified alignment in the parameter save area. There was some code in PPCISelLowering.cpp that attempted to implement this, but this didn't work correctly: while code did update the ArgOffset value, it neglected to update the PtrOff value (which was already computed from the old ArgOffset), and it also neglected to update GPR_idx -- fields skipped due to alignment in the save area must likewise be skipped in GPRs. This patch fixes and simplifies this logic by: - handling argument offset alignment right at the beginning of argument processing, using a new helper routine CalculateStackSlotAlignment (this avoids having to update PtrOff and other derived values later on) - not tracking GPR_idx separately, but always computing the correct GPR_idx for each argument from its ArgOffset - removing some redundant computation in LowerFormalArguments: MinReservedArea must equal ArgOffset after argument processing, so there's no use in computing it twice. [This doesn't change the behavior of the current clang front-end, since that never creates "byval align" arguments at the moment. This will change with a follow-on patch, however.] llvm-svn: 212476
*	[DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.	Juergen Ributzka	2014-07-01	1	-1/+2
\| \| \| \| \| \| \| \|	The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135
*	Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to ↵	Craig Topper	2014-06-29	1	-8/+6
\| \| \| \| \| \|	simplify some code. llvm-svn: 211993
*	[PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize	Ulrich Weigand	2014-06-23	1	-20/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of r211495, the only remaining users of getMinCallFrameSize are in core ABI code (LowerFormalParameter / LowerCall). This is actually a good thing, since the details of the parameter save area are ABI specific. With the new ELFv2 ABI in particular, the rules defining the size of the save area will become significantly more complex, so it wouldn't make sense to implement those outside ABI code that has all required information. In preparation, this patch eliminates the getMinCallFrameSize (and associated getMinCallArgumentsSize) routines, and inlines them into all callers. Note that since nearly all call arguments are constant, this allows simplifying the inlined copies to a single line everywhere. No change in generate code expected. llvm-svn: 211497
*	[PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls	Ulrich Weigand	2014-06-23	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As remarked in the commit message to r211493, in several places throughout the 64-bit SVR4 ABI code there are calls to PPCFrameLowering::getLinkageSize and getMinCallFrameSize using an incorrect IsDarwin argument of "true". (Some of those were made explicit by the above refactoring patch, others have been there all along.) This patch fixes those places to pass "false" for IsDarwin. No change in generated code expected. llvm-svn: 211494
*	[PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize	Ulrich Weigand	2014-06-23	1	-122/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and CalculateParameterAndLinkageAreaSize are currently used as subroutines from both 64-bit SVR4 and Darwin ABI code. However, the two ABIs are already quite different w.r.t. AltiVec conventions, and they will become more different when the ELFv2 ABI is supported. Also, in general it seems better to disentangle ABI support routines for different ABIs to avoid accidentally affecting one ABI when intending to change only the other. (Actually, the current code strictly speaking already contains a bug: these routines call PPCFrameLowering::getMinCallFrameSize and PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to "true" even on 64-bit SVR4. This bug currently has no adverse effect since those routines always return the same for 64-bit SVR4 and 64-bit Darwin, but it still seems wrong ... I'll fix this in a follow-up commit shortly.) To remove this code sharing, I'm simply inlining both routines into all call sites (there are just two each, one for 64-bit SVR4 and one for Darwin), and simplifying due to constant parameters where possible. A small piece of code that does make sense to share is refactored into the new routine EnsureStackAlignment, now also called from 32-bit SVR4 ABI code. No change in generated code is expected. llvm-svn: 211493
*	[PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4	Ulrich Weigand	2014-06-23	1	-44/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current 64-bit SVR4 code seems to have some remnants of Darwin code in AltiVec argument handing. This had the effect that AltiVec arguments (or subsequent arguments) were not correctly placed in the parameter area in some cases. The correct behaviour with the 64-bit SVR4 ABI is: - All AltiVec arguments take up space in the parameter area, just like any other arguments, whether vararg or not. - They are always 16-byte aligned, skipping a parameter area doubleword (and the associated GPR, if any), if necessary. This patch implements the correct behaviour and adds a test case. (Verified against GCC behaviour via the ABI compat test suite.) llvm-svn: 211492
*	[PowerPC] Fix small argument stack slot offset for LE	Ulrich Weigand	2014-06-20	1	-11/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When small arguments (structures < 8 bytes or "float") are passed in a stack slot in the ppc64 SVR4 ABI, they must reside in the least significant part of that slot. On BE, this means that an offset needs to be added to the stack address of the parameter, but on LE, the least significant part of the slot has the same address as the slot itself. This changes the PowerPC back-end ABI code to only add the small argument stack slot offset for BE. It also adds test cases to verify the correct behavior on both BE and LE. llvm-svn: 211368
*	[PowerPC] Remove unnecessary load of r12 in indirect call	Ulrich Weigand	2014-06-18	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When looking at the 64-bit SVR4 indirect call sequence, I noticed an unnecessary load of r12. And indeed the code says: // R12 must contain the address of an indirect callee. But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is no need to load r12 at this point. It seems this code and comment is a remnant of code originally shared with the Darwin ABI ... This patch simply removes the unnecessary load. llvm-svn: 211203
*	[PowerPC] Simplify and improve loading into TOC register	Ulrich Weigand	2014-06-18	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During an indirect function call sequence on the 64-bit SVR4 ABI, generate code must load and then restore the TOC register. This does not use a regular LOAD instruction since the TOC register r2 is marked as reserved. Instead, the are two special instruction patterns: let RST = 2, DS = 2 in def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg), "ld 2, 8($reg)", IIC_LdStLD, [(PPCload_toc i64:$reg)]>, isPPC64; let RST = 2, DS = 10, RA = 1 in def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins), "ld 2, 40(1)", IIC_LdStLD, [(PPCtoc_restore)]>, isPPC64; Note that these not only restrict the destination of the load to r2, but they also restrict the source of the load to particular address combinations. The latter is a problem when we want to support the ELFv2 ABI, since there the TOC save slot is no longer at 40(1). This patch replaces those two instructions with a single instruction pattern that only hard-codes r2 as destination, but supports generic addresses as source. This will allow supporting the ELFv2 ABI, and also helps generate more efficient code for calls to absolute addresses (allowing simplification of the ppc64-calls.ll test case). llvm-svn: 211193
*	[PowerPC] Do not use BLA with the 64-bit SVR4 ABI	Ulrich Weigand	2014-06-18	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PowerPC back-end uses BLA to implement calls to functions at known-constant addresses, which is apparently used for certain system routines on Darwin. However, with the 64-bit SVR4 ABI, this is actually incorrect. An immediate function pointer value on this platform is not directly usable as a target address for BLA: - in the ELFv1 ABI, the function pointer value refers to the function descriptor, not the code address - in the ELFv2 ABI, the function pointer value refers to the global entry point, but BL(A) would only be correct when calling the local entry point This bug didn't show up since using immediate function pointer values is not usually done in the 64-bit SVR4 ABI in the first place. However, I ran into this issue with a certain use case of LLVM as JIT, where immediate function pointer values were uses to implement callbacks from JITted code to helpers in statically compiled code. Fixed by simply not using BLA with the 64-bit SVR4 ABI. llvm-svn: 211174
*	Remove an extraneous this-> to access the subtarget.	Eric Christopher	2014-06-12	1	-1/+1
\| \| \| \|	llvm-svn: 210849
*	Rename PPCSubTarget to Subtarget in PPCTargetLowering for consistency.	Eric Christopher	2014-06-12	1	-125/+123
\| \| \| \| \| \|	Also remove an extra local subtarget in the initialization functions. llvm-svn: 210848
*	[PPC64LE] Recognize shufflevector patterns for little endian	Bill Schmidt	2014-06-10	1	-55/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Various masks on shufflevector instructions are recognizable as specific PowerPC instructions (vector pack, vector merge, etc.). There is existing code in PPCISelLowering.cpp to recognize the correct patterns for big endian code. The masks for these instructions are different for little endian code due to the big-endian numbering employed by these instructions. This patch adds the recognition code for little endian. I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for this. The existing recognizer test (vec_shuffle.ll) is unnecessarily verbose and difficult to read, so I felt it was better to add a new test rather than modify the old one. llvm-svn: 210536
*	[PPC64LE] Generate correct code for unaligned little-endian vector loads	Bill Schmidt	2014-06-09	1	-21/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in PPCTargetLowering::PerformDAGCombine() that handles unaligned Altivec vector loads generates a lvsl followed by a vperm. As we've seen in numerous other places, the vperm instruction has a big-endian bias, and this is fixed for little endian by complementing the permute control vector and swapping the input operands. In this case the lvsl is providing the permute control vector. Rather than generating an lvsl and a complement operation, it is sufficient to generate an lvsr instruction instead. Thus for LE code generation we will generate an lvsr rather than an lvsl, and swap the other input arguments on the vperm. The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test the code generation for PPC64 and PPC64LE, in addition to the existing PPC32/G5 testing. llvm-svn: 210493
*	[PPC64LE] Generate correct little-endian code for v16i8 multiply	Bill Schmidt	2014-06-09	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing code in PPCTargetLowering::LowerMUL() for multiplying two v16i8 values assumes that vector elements are numbered in big-endian order. For little-endian targets, the vector element numbering is reversed, but the vmuleub, vmuloub, and vperm instructions still assume big-endian numbering. To account for this, we must adjust the permute control vector and reverse the order of the input registers on the vperm instruction. The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed on powerpc64 and powerpc64le targets as well as the original powerpc (32-bit) target. llvm-svn: 210474
*	[PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian	Bill Schmidt	2014-06-06	1	-3/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a couple of lowering issues for little endian PowerPC. The code for lowering BUILD_VECTOR contains a number of optimizations that are only valid for big endian. For now, we disable those optimizations for correctness. In the future, we will add analogous optimizations that are correct for little endian. When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to make the now-familiar transformation of swapping the input operands and complementing the permute control vector. Correctness of this transformation is tested by the accompanying test case. llvm-svn: 210336
*	Omit else branch after return.	Eric Christopher	2014-06-02	1	-2/+4
\| \| \| \|	llvm-svn: 210034
*	Have the TLOF creation take a Triple rather than needing a subtarget.	Eric Christopher	2014-05-31	1	-3/+5
\| \| \| \|	llvm-svn: 209937
*	isSVR4ABI() returned !isDarwin() so just move that to the else	Eric Christopher	2014-05-30	1	-4/+1
\| \| \| \| \| \|	block and remove the unreachable code. llvm-svn: 209927
*	Rename CreateTLOF->createTLOF to match the rest of the file and the	Eric Christopher	2014-05-30	1	-4/+4
\| \| \| \| \| \|	rest of the targets with a similar function name. llvm-svn: 209926
*	[PATCH] Correct type used for VADD_SPLAT optimization on PowerPC	Bill Schmidt	2014-05-27	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there is an optimization for certain patterns to generate one or two vector splats followed by a vector add or subtract. This operation is represented by a VADD_SPLAT in the selection DAG. Prior to this patch, it was possible for the VADD_SPLAT to be assigned the wrong data type, causing incorrect code generation. This patch corrects the problem. Specifically, the code previously assigned the value type of the BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is correct much of the time, but not always. The problem is that the call to isConstantSplat() may return a SplatBitSize that is not the same as the number of bits in the original element vector type. The correct type to assign is a vector type with the same element bit size as SplatBitSize. The included test case shows an example of this, where the BUILD_VECTOR node has a type of v16i8. The vector to be built is {0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat detects that we can generate a splat of 16 for type v8i16, which is the type we must assign to the VADD_SPLAT node. If we do not, we generate a vspltisb of 8 and a vaddubm, which generates the incorrect result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16}. The correct code generation is a vspltish of 8 and a vadduhm. This patch also corrected code generation for CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked as an XFAIL, so we can remove the XFAIL from the test case. llvm-svn: 209662
*	[PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate	Adam Nemet	2014-05-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64 backend. We matched an immediate offset with STWX8 even though it only supports register offset. The culprit is the complex-pattern predicate, SelectAddrIdx, which decides that if the offset is not ISD::Constant it must be a register. Many thanks to Bill Schmidt for testing this. llvm-svn: 209219
*	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the ↵	Benjamin Kramer	2014-05-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123
*	Target: remove old constructors for CallLoweringInfo	Saleem Abdulrasool	2014-05-17	1	-10/+5
\| \| \| \| \| \| \| \| \| \|	This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082
*	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been	Jay Foad	2014-05-14	1	-12/+12
\| \| \| \| \| \|	inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811
*	[PowerPC] Add global named register support	Hal Finkel	2014-05-11	1	-0/+25
\| \| \| \| \| \| \|	Support for the intrinsics that read from and write to global named registers is added for r1, r2 and r13 (depending on the subtarget). llvm-svn: 208509
*	Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I ↵	Craig Topper	2014-04-30	1	-5/+4
\| \| \| \| \| \|	introduced most of these recently. llvm-svn: 207616
*	Convert more SelectionDAG functions to use ArrayRef.	Craig Topper	2014-04-28	1	-1/+1
\| \| \| \|	llvm-svn: 207397
*	[C++] Use 'nullptr'.	Craig Topper	2014-04-28	1	-2/+2
\| \| \| \|	llvm-svn: 207394
*	Convert SelectionDAG::getMergeValues to use ArrayRef.	Craig Topper	2014-04-27	1	-4/+4
\| \| \| \|	llvm-svn: 207374
*	Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer ↵	Craig Topper	2014-04-26	1	-7/+5
\| \| \| \| \| \|	and size. llvm-svn: 207329
*	Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.	Craig Topper	2014-04-26	1	-35/+24
\| \| \| \|	llvm-svn: 207327
*	[C++] Use 'nullptr'. Target edition.	Craig Topper	2014-04-25	1	-26/+27
\| \| \| \|	llvm-svn: 207197
*	Add 'musttail' marker to call instructions	Reid Kleckner	2014-04-24	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This is similar to the 'tail' marker, except that it guarantees that tail call optimization will occur. It also comes with convervative IR verification rules that ensure that tail call optimization is possible. Reviewers: nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D3240 llvm-svn: 207143
*	Break PseudoSourceValue out of the Value hierarchy. It is now the root of ↵	Nick Lewycky	2014-04-15	1	-2/+2
\| \| \| \| \| \|	its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255
*	[PowerPC] Implement some additional TLI callbacks	Hal Finkel	2014-04-12	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add implementations of: bool isLegalICmpImmediate(int64_t Imm) const bool isLegalAddImmediate(int64_t Imm) const bool isTruncateFree(Type Ty1, Type Ty2) const bool isTruncateFree(EVT VT1, EVT VT2) const bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const Unfortunately, this regresses counter-register-based loop formation because some of the loops now end up in forms were SE cannot compute loop counts. However, nevertheless, the test-suite results favor committing: SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown llvm-svn: 206120
*	[PowerPC] Don't return false from PPC::isVSLDOIShuffleMask	Hal Finkel	2014-04-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle predicate should be false. Noticed by inspection; no test case (yet). llvm-svn: 205787
*	Make consistent use of MCPhysReg instead of uint16_t throughout the tree.	Craig Topper	2014-04-04	1	-24/+24
\| \| \| \|	llvm-svn: 205610
*	[PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles	Hal Finkel	2014-03-31	1	-0/+9
\| \| \| \| \| \| \|	If we have two unique values for a v2i64 build vector, this will always result in two vector loads if we expand using shuffles. Only one is necessary. llvm-svn: 205231
*	[PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG	Hal Finkel	2014-03-30	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by converting two and from v2i64. The small trick necessary here is to shift the i32 elements into the right lanes before the i32 -> f64 step. This is because of the big Endian nature of the system, we need the i32 portion in the high word of the i64 elements. For v2i16 and v2i8 we can do the same, but we first use the default Altivec shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and then apply the above procedure. llvm-svn: 205146
*	[PowerPC] Handle v2i64 comparisons	Hal Finkel	2014-03-29	1	-0/+23
\| \| \| \| \| \| \| \|	v2i64 is a legal type under VSX, however we don't have native vector comparisons. We can handle eq/ne by casting it to an Altivec type, but everything else must be expanded. llvm-svn: 205106
*	[PowerPC] Add subregister classes for f64 VSX values	Hal Finkel	2014-03-29	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte values even through we almost always had scalar 8-byte values. This resulted in an increase in stack-size use, extra memory bandwidth, etc. To fix this, I've added 64-bit subregisters of the Altivec registers, and combined those with the existing scalar floating-point registers to form a class of VSX scalar floating-point registers. The ABI code has also been enhanced to use this register class and some other necessary improvements have been made. llvm-svn: 205075
*	[PowerPC] v2[fi]64 need to be explicitly passed in VSX registers	Hal Finkel	2014-03-28	1	-4/+28
\| \| \| \| \| \| \| \|	v2[fi]64 values need to be explicitly passed in VSX registers. This is because the code in TRI that finds the minimal register class given a register and a value type will assert if given an Altivec register and a non-Altivec type. llvm-svn: 205041
*	[PowerPC] Fix v2f64 vector extract and related patterns	Hal Finkel	2014-03-27	1	-0/+1
\| \| \| \| \| \| \| \| \|	First, v2f64 vector extract had not been declared legal (and so the existing patterns were not being used). Second, the patterns for that, and for scalar_to_vector, should really be a regclass copy, not a subregister operation, because the VSX registers directly hold both the vector and scalar data. llvm-svn: 204971