bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[PowerPC 3/4] Little-endian adjustments for VSX vector shuffle	Bill Schmidt	2014-12-09	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When performing instruction selection for ISD::VECTOR_SHUFFLE, there is special code for handling v2f64 and v2i64 using VSX instructions. This code must be adjusted for little-endian. Because the two inputs are treated as a double-wide register, we must swap their order for little endian. To get the appropriate mask elements to use with the big-endian biased XXPERMDI instruction, we must reverse their order and invert the bits. A new test is added to test the 16 possible values of the shuffle mask. It is initially disabled for reasons specified in the test. It is re-enabled by patch 4/4. llvm-svn: 223791
*	[PowerPC 2/4] Little-endian adjustments for VSX insert/extract operations	Bill Schmidt	2014-12-09	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For little endian, we need to make some straightforward adjustments in the code expansions for scalar_to_vector and vector_extract of v2f64. First, scalar_to_vector must place the scalar into vector element zero. However, our implementation of SUBREG_TO_REG will place it into big-element vector element zero (high-order bits), and for little endian we need it in the low-order bits. The LE implementation splats the high-order doubleword into the low-order doubleword. Second, the meaning of (vector_extract x, 0) and (vector_extract x, 1) must be reversed for similar reasons. A new test is added that tests code generation for insertelement and extractelement for both element 0 and element 1. It is disabled in this patch but enabled in patch 4/4, for reasons stated in the test. llvm-svn: 223788
*	[PowerPC 1/4] Little-endian adjustments for VSX loads/stores	Bill Schmidt	2014-12-09	3	-2/+202
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses the inherent big-endian bias in the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. These instructions load vector elements into registers left-to-right (with the first element loaded into the high-order bits of the register), regardless of the endian setting of the processor. However, these are the only vector memory instructions that permit unaligned storage accesses, so we want to use them for little-endian. To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x followed by an xxswapd, which swaps the doublewords. This works for lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the vector elements are in LE order (right-to-left) within each doubleword. (Thus after lxvw2x of a <4 x float> the elements will appear as 1, 0, 3, 2. Following the swap, they will appear as 3, 2, 0, 1, as desired.) For stores, an stxvd2x or stxvw4x is replaced with an stxvd2x preceded by an xxswapd. Introduction of extra swap instructions provides correctness, but obviously is not ideal from a performance perspective. Future patches will address this with optimizations to remove most of the introduced swaps, which have proven effective in other implementations. The introduction of the swaps is performed during lowering of LOAD, STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations. The latter are used to translate intrinsics that specify the VSX loads and stores directly into equivalent sequences for little endian. Thus code that uses vec_vsx_ld and vec_vsx_st does not have to be modified to be ported from BE to LE. We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for use during this lowering step. In PPCInstrVSX.td, we add new SDType and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd). These are recognized during instruction selection and mapped to the correct instructions. Several tests that were written to use -mcpu=pwr7 or pwr8 are modified to disable VSX on LE variants because code generation changes with this and subsequent patches in this set. I chose to include all of these in the first patch than try to rigorously sort out which tests were broken by one or another of the patches. Sorry about that. The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll, are disabled until LE support is enabled because of breakages that occur as noted in those tests. They are re-enabled in patch 4/4. llvm-svn: 223783
*	Restore r223709 as it was meant to be, and enable FeatureP8Vector for P8	Bill Schmidt	2014-12-09	1	-2/+2
\| \| \| \|	llvm-svn: 223751
*	Revert r223709, "[PowerPC]Activate FeatureVSX for the Power target", to ↵	NAKAMURA Takumi	2014-12-09	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unbreak bots. CodeGen/PowerPC/vsx-p8.ll was failing. '+power8-vector' is not a recognized feature for this target (ignoring feature) llvm/test/CodeGen/PowerPC/vsx-p8.ll:33:14: error: expected string not found in input ; CHECK-REG: lxvw4x 34, 0, 3 ^ <stdin>:50:2: note: scanning from here .align 3 ^ <stdin>:61:2: note: possible intended match here lvx 3, 0, 3 ^ llvm-svn: 223729
*	[PowerPC]Activate FeatureVSX for the Power target	Bill Seurer	2014-12-08	1	-5/+3
\| \| \| \| \| \| \| \|	This change activates FeatureVSX for Power 7 and Power 8 in PPC.td. http://reviews.llvm.org/D6570 llvm-svn: 223709
*	[PowerPC] Don't use a non-allocatable register to implement the 'cc' alias	Hal Finkel	2014-12-08	2	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when processing inline asm constraints. This had previously been implemented using a non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but the infrastructure does not seem to support this properly (neither the register allocator nor the scheduler properly accounts for the alias). Instead, we can just process this as a naming alias inside of the inline asm constraint-processing code, so we'll do that instead. There are two regression tests, one where the post-RA scheduler did the wrong thing with the non-allocatable alias, and one where the register allocator did the wrong thing. Fixes PR21742. llvm-svn: 223708
*	[PowerPC]Add VSX loads/stores to fastisel for PPC target	Bill Seurer	2014-12-05	1	-4/+36
\| \| \| \| \| \| \| \| \| \|	This patch adds VSX floating point loads and stores to fastisel. Along with the change to tablegen (D6220), VSX instructions are now fully supported in fastisel. http://reviews.llvm.org/D6274 llvm-svn: 223507
*	[PowerPC] 'cc' should be an alias only to 'cr0'	Hal Finkel	2014-12-04	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \|	We had mistakenly believed that GCC's 'cc' referred to the entire condition-code register (cr0 through cr7) -- and implemented this in r205630 to fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM to clobber too much with legacy code with inline asm using the 'cc' clobber. Fixes PR21451. llvm-svn: 223328
*	[PowerPC] Fix inline asm memory operands not to use r0	Hal Finkel	2014-12-03	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is a register containing the address. As a result, this register cannot be r0, and we need to enforce this register subclass constraint to prevent miscompiling the code (we'd get this constraint for free with the usual instruction definitions, but that scheme has no knowledge of how we end up printing inline asm memory operands, and so here we need to do it 'by hand'). We can accomplish this within the current address-mode selection framework by introducing an explicit COPY_TO_REGCLASS node. Fixes PR21443. llvm-svn: 223318
*	Add TableGen info for Power8.	Will Schmidt	2014-12-03	2	-0/+395
\| \| \| \| \| \| \| \|	This is based on the Power7 version, with units added and renamed to match P8. Differential Revision: http://reviews.llvm.org/D6358 llvm-svn: 223257
*	[PowerPC] Print all inline-asm consts as signed numbers	Hal Finkel	2014-12-03	1	-13/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed numbers, and it is important that we print them as such. To make sure that happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it does all intermediate checks on a signed-extended int64_t value, and then creates the resulting target constant using MVT::i64. This will ensure that all negative values are printed as negative values (mirroring what is done in other backends to achieve the same sign-extension effect). This came up in the context of inline assembly like this: "add%I2 %0,%0,%2", ..., "Ir"(-1ll) where we used to print: addi 3,3,4294967295 and gcc would print: addi 3,3,-1 and gas accepts both forms, but our builtin assembler (correctly) does not. Now we print -1 like gcc does. While here, I replaced a bunch of custom integer checks with isInt<16> and friends from MathExtras.h. Thanks to Paul Hargrove for the bug report. llvm-svn: 223220
*	[PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets	Hal Finkel	2014-12-03	1	-5/+3
\| \| \| \| \| \| \|	We need to use the custom expansion of readcyclecounter on all 32-bit targets (even those with 64-bit registers). This should fix the ppc64 buildbot. llvm-svn: 223182
*	[PowerPC] Implement readcyclecounter for PPC32	Hal Finkel	2014-12-02	4	-0/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161
*	[PowerPC] Fix unwind info with dynamic stack realignment	Jay Foad	2014-12-01	1	-12/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: PowerPC DWARF unwind info defined CFA as SP + offset even in a function where the stack had been dynamically realigned. This clearly doesn't work because the offset from SP to CFA is not a constant. Fix it by defining CFA as BP instead. This was causing the AddressSanitizer null_deref test to fail 50% of the time, depending on whether SP happened to be 32-byte aligned on entry to a particular function or not. Reviewers: willschm, uweigand, hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6410 llvm-svn: 222996
*	[PowerPC] Add asm support for cache-inhibited ld/st instructions	Hal Finkel	2014-11-30	1	-0/+18
\| \| \| \| \| \| \| \| \|	Add assembler support for the fixed-point cache-inhibited load/store instructions. These are hypervisor-level only, so don't get too excited ;) Fixes PR21650. llvm-svn: 222976
*	Target triple OS detection tidyup. NFC	Simon Pilgrim	2014-11-29	1	-3/+1
\| \| \| \| \| \|	Use Triple::isOS*() helpers where possible. llvm-svn: 222960
*	Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.	Craig Topper	2014-11-26	3	-30/+30
\| \| \| \|	llvm-svn: 222801
*	[PowerPC] Add the 'attn' instruction	Hal Finkel	2014-11-25	2	-0/+8
\| \| \| \| \| \| \| \|	The attn instruction is not part of the Power ISA, but is documented in the A2 user manual, and is accepted by the GNU assembler for the A2 and the POWER4+. Reported as part of PR21650. llvm-svn: 222712
*	[PowerPC] Implement combineRepeatedFPDivisors	Hal Finkel	2014-11-24	2	-0/+23
\| \| \| \| \| \| \| \|	This does not matter on newer cores (where we can use reciprocal estimates in fast-math mode anyway), but for older cores this allows us to generate better fast-math code where we have multiple FDIVs with a common divisor. llvm-svn: 222710
*	[PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment	Ulrich Weigand	2014-11-24	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \|	When processing an assignment in the integrated assembler that sets a symbol to the value of another symbol, we need to copy the st_other bits that encode the local entry point offset. Modeled after MipsTargetELFStreamer::emitAssignment handling of the ELF::STO_MIPS_MICROMIPS flag. llvm-svn: 222672
*	Add LLVMScalarOpts to LLVMPowerPCCodeGen.	NAKAMURA Takumi	2014-11-21	1	-1/+1
\| \| \| \|	llvm-svn: 222516
*	Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'	Craig Topper	2014-11-21	1	-9/+6
\| \| \| \|	llvm-svn: 222509
*	[PPC] Use SeparateConstOffsetFromGEP	Hal Finkel	2014-11-21	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \|	This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM, there is a store moved out of the inner loop) and a potential speedup on MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it makes some code look cleaner, and synchronizing the backends in this regard seems like a generally good thing. llvm-svn: 222504
*	Add out of line virtual destructors to all LLVMTargetMachine subclasses	Reid Kleckner	2014-11-20	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These recently all grew a unique_ptr<TargetLoweringObjectFile> member in r221878. When anyone calls a virtual method of a class, clang-cl requires all virtual methods to be semantically valid. This includes the implicit virtual destructor, which triggers instantiation of the unique_ptr destructor, which fails because the type being deleted is incomplete. This is just part of the ongoing saga of PR20337, which is affecting Blink as well. Because the MSVC ABI doesn't have key functions, we end up referencing the vtable and implicit destructor on any virtual call through a class. We don't actually end up emitting the dtor, so it'd be good if we could avoid this unneeded type completion work. llvm-svn: 222480
*	Update SetVector to rely on the underlying set's insert to return a ↵	David Blaikie	2014-11-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
*	[PowerPC] Add VSX builtins for vec_div	Bill Schmidt	2014-11-14	1	-0/+6
\| \| \| \| \| \| \| \| \|	This patch adds builtin support for xvdivdp and xvdivsp, along with a test case. Straightforward stuff. There's a companion patch for Clang. llvm-svn: 221983
*	We can get the TLOF from the TargetMachine - so constructor no longer ↵	Aditya Nandakumar	2014-11-13	1	-1/+1
\| \| \| \| \| \|	requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
*	This patch changes the ownership of TLOF from TargetLoweringBase to ↵	Aditya Nandakumar	2014-11-13	3	-10/+16
\| \| \| \| \| \|	TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
*	Add support for small-model PIC for PowerPC.	Justin Hibbits	2014-11-12	5	-63/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Large-model was added first. With the addition of support for multiple PIC models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This generates more optimal code, for shared libraries with less than about 16380 data objects. Test Plan: Test cases added or updated Reviewers: joerg, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, mcrosier, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D5399 llvm-svn: 221791
*	[PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics	Bill Schmidt	2014-11-12	2	-8/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. llvm-svn: 221767
*	Pass an ArrayRef to MCDisassembler::getInstruction.	Rafael Espindola	2014-11-12	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \|	With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t> instead of a MemoryObject. Even on X86 there is a maximum size an instruction can have. Given that, it seems way simpler and more efficient to just pass an ArrayRef to the disassembler instead of a MemoryObject and have it do a virtual call every time it wants some extra bytes. llvm-svn: 221751
*	[PowerPC] Replace foul hackery with real calls to __tls_get_addr	Bill Schmidt	2014-11-11	7	-125/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My original support for the general dynamic and local dynamic TLS models contained some fairly obtuse hacks to generate calls to __tls_get_addr when lowering a TargetGlobalAddress. Rather than generating real calls, special GET_TLS_ADDR nodes were used to wrap the calls and only reveal them at assembly time. I attempted to provide correct parameter and return values by chaining CopyToReg and CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not fully correct. Problems were seen with two back-to-back stores to TLS variables, where the call sequences ended up overlapping with unhappy results. Additionally, since these weren't real calls, the proper register side effects of a call were not recorded, so clobbered values were kept live across the calls. The proper thing to do is to lower these into calls in the first place. This is relatively straightforward; see the changes to PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp. The changes here are standard call lowering, except that we need to track the fact that these calls will require a relocation. This is done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the TargetGlobalAddress operand that appears earlier in the sequence. The calls to LowerCallTo() eventually find their way to LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(), which calls PrepareCall(). In PrepareCall(), we detect the calls to __tls_get_addr and immediately snag the TargetGlobalTLSAddress with the annotated relocation information. This becomes an extra operand on the call following the callee, which is expected for nodes of type tlscall. We change the call opcode to CALL_TLS for this case. Back in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only, since we require a TOC-restore nop following the call for the 64-bit ABIs. During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td convert the CALL_TLS nodes into BL_TLS nodes, and convert the CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS nodes can now be emitted normally using their patterns and the associated printTLSCall print method. Finally, as a result of these changes, all references to get-tls-addr in its various guises are no longer used, so they have been removed. There are existing TLS tests to verify the changes haven't messed anything up). I've added one new test that verifies that the problem with the original code has been fixed. llvm-svn: 221703
*	MCAsmParserExtension has a copy of the MCAsmParser. Use it.	Rafael Espindola	2014-11-11	1	-9/+11
\| \| \| \| \| \|	Base classes were storing a second copy. llvm-svn: 221667
*	Misc style fixes. NFC.	Rafael Espindola	2014-11-10	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a few cases of: * Wrong variable name style. * Lines longer than 80 columns. * Repeated names in comments. * clang-format of the above. This make the next patch a lot easier to read. llvm-svn: 221615
*	Remove redundant calls to isMaterializable.	Rafael Espindola	2014-11-01	1	-3/+1
\| \| \| \| \| \| \| \| \| \|	This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050
*	[PowerPC] Initial VSX intrinsic support, with min/max for vector double	Bill Schmidt	2014-10-31	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have initial support for VSX, we can begin adding intrinsics for programmer access to VSX instructions. This patch adds basic support for VSX intrinsics in general, and tests it by implementing intrinsics for minimum and maximum for the vector double data type. The LLVM portion of this is quite straightforward. There is a companion patch for Clang. llvm-svn: 220988
*	[PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code	Ulrich Weigand	2014-10-31	4	-10/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since block address values can be larger than 2GB in 64-bit code, they cannot be loaded simply using an @l / @ha pair, but instead must be loaded from the TOC, just like GlobalAddress, ConstantPool, and JumpTable values are. The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where temporary labels could not be used as TOC values, since code would attempt (and fail) to use GetOrCreateSymbol to create a symbol of the same name as the temporary label. llvm-svn: 220959
*	Use rsqrt (X86) to speed up reciprocal square root calcs	Sanjay Patel	2014-10-24	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
*	[PATCH] Support select-cc for VSFRC when VSX is enabled	Bill Schmidt	2014-10-22	3	-5/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to handle <2 x double> cases. This patch adds SELECT_VSFRC and SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the f64 type when VSX is enabled. The changes are analogous to those in the previous patch. I've added a new variant to vsx.ll to test the code generation. (I also cleaned up a little formatting in PPCInstrVSX.td from the previous patch.) llvm-svn: 220395
*	[PowerPC] Support select-cc for VSX	Bill Schmidt	2014-10-22	3	-3/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tests test/CodeGen/Generic/select-cc.ll and test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled. The problem is that the lowering logic for the SELECT and SELECT_CC operations doesn't currently support the VSX registers. This patch fixes that. In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this for other register classes. Similar pseudos are added in PPCInstrVSX.td (they must be there, because the "vsrc" register class definition appears there) for the VSRC register class. The SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC. The rest of the patch just adds logic for SELECT_VSRC wherever similar logic appears for SELECT_VRRC. There are no new test cases because the existing tests above test this, along with a variant in test/CodeGen/PowerPC/vsx.ll. After discussion with Hal, a future patch will add similar _VSFRC variants to override f64 type handling (currently using F8RC). llvm-svn: 220385
*	[PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg	Bill Schmidt	2014-10-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in the FMA mutation pass. If we have a situation where a killed product register is the same register as the FMA target, such as: %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5, %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 then the substitution makes no sense. We end up getting a crash when we try to extend the interval associated with the killed product register, as there is already a live range for %vreg5 there. This patch just disables the mutation under those circumstances. Since recipest.ll generates different code with VMX enabled, I've modified that test to use -mattr=-vsx. I've borrowed the code from that test that exposed the bug and placed it in fma-mutate.ll, where it tests several mutation opportunities including the "bad" one. llvm-svn: 220290
*	[PowerPC] Change assert to better form	Bill Schmidt	2014-10-17	1	-3/+3
\| \| \| \|	llvm-svn: 220092
*	[PowerPC] Change liveness testing in VSX FMA mutation pass	Bill Schmidt	2014-10-17	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With VSX enabled, LLVM crashes when compiling test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test that's revised in this patch. The interval test is designed to only work for virtual registers, but in this case the AddendSrcReg is physical. Since there is already a walk of the MIs between the AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg in that loop. At Hal Finkel's request, I converted the liveness test to an assert restricted to virtual registers. I've changed the fma.ll test to have VSX and non-VSX variants so we can test both kinds of multiply-adds. llvm-svn: 220090
*	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation	Bill Schmidt	2014-10-17	2	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047
*	Simplify handling of --noexecstack by using getNonexecutableStackSection.	Rafael Espindola	2014-10-15	1	-7/+3
\| \| \| \|	llvm-svn: 219799
*	Use the triple to figure out if this is a darwin target, not	Eric Christopher	2014-10-14	1	-1/+1
\| \| \| \| \| \|	the subtarget. llvm-svn: 219673
*	MC: Bit pack MCSymbolData.	Benjamin Kramer	2014-10-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573
*	[PowerPC] Reduce names from Power8Vector to P8Vector	Bill Schmidt	2014-10-10	3	-8/+7
\| \| \| \| \| \|	Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514
*	[PowerPC] Add feature for Power8 vector extensions	Bill Schmidt	2014-10-10	3	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501