bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[PowerPC] [Constant Hoisting] Enable constant hoisting on PPC	Hal Finkel	2014-04-13	1	-0/+147
\| \| \| \| \| \| \| \| \| \|	Implements the various TTI functions to enable constant hoisting on PPC. The only significant test-suite change is this: MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup (which essentially reverses the slowdown from r206120). llvm-svn: 206141
*	[PowerPC] Fix rlwimi isel when mask is not constant	Hal Finkel	2014-04-13	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had been using the known-zero values of the operand of the or to construct the mask for an rlwimi; this is not quite correct, but fine when the mask is constant. When the mask is constant, then the known zeros of the operand must be a superset of the zeros in the mask. However, when the mask is not a constant, then there might be bits in the operand that are not known to be zero that, at runtime, might be zero in the mask. Therefore, we check that any bits not known to be zero are known to be one in the mask. Otherwise, we can't fold the mask with the or and shift. This was revealed as a miscompile of MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with constant hoisting. llvm-svn: 206136
*	[PowerPC] Implement some additional TLI callbacks	Hal Finkel	2014-04-12	2	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add implementations of: bool isLegalICmpImmediate(int64_t Imm) const bool isLegalAddImmediate(int64_t Imm) const bool isTruncateFree(Type Ty1, Type Ty2) const bool isTruncateFree(EVT VT1, EVT VT2) const bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const Unfortunately, this regresses counter-register-based loop formation because some of the loops now end up in forms were SE cannot compute loop counts. However, nevertheless, the test-suite results favor committing: SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown llvm-svn: 206120
*	LLVMBuild.txt: Reformat.	NAKAMURA Takumi	2014-04-10	2	-3/+3
\| \| \| \|	llvm-svn: 205961
*	[PowerPC] Don't return false from PPC::isVSLDOIShuffleMask	Hal Finkel	2014-04-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle predicate should be false. Noticed by inspection; no test case (yet). llvm-svn: 205787
*	[PowerPC] Remove unused TM member variable to unbreak build	Hal Finkel	2014-04-05	1	-3/+2
\| \| \| \| \| \|	Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]" llvm-svn: 205660
*	[PowerPC] Adjust load/store costs in PPCTTI	Hal Finkel	2014-04-04	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. llvm-svn: 205658
*	[PowerPC] PPCTTI Cleanup	Hal Finkel	2014-04-04	1	-4/+0
\| \| \| \| \| \|	Remove the declaration of an unimplemented function. llvm-svn: 205657
*	[PowerPC] Add a full condition code register to make the "cc" clobber work	Hal Finkel	2014-04-04	1	-0/+12
\| \| \| \| \| \| \| \|	gcc inline asm supports specifying "cc" as a clobber of all condition registers. Add just enough modeling of the full register to make this work. Fixed PR19326. llvm-svn: 205630
*	Make consistent use of MCPhysReg instead of uint16_t throughout the tree.	Craig Topper	2014-04-04	3	-26/+26
\| \| \| \|	llvm-svn: 205610
*	[PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost	Hal Finkel	2014-04-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to calculate the base cost of the memory access, and then adjust on top of that. There is no functionality change from this modification, but it will become important so that PPCTTI can take advantage of scalarization information for which BasicTTI::getMemoryOpCost will account in the near future. llvm-svn: 205476
*	Simplify resolveFrameIndex() signature.	Jim Grosbach	2014-04-02	2	-7/+4
\| \| \| \| \| \| \| \|	Just pass a MachineInstr reference rather than an MBB iterator. Creating a MachineInstr& is the first thing every implementation did anyway. llvm-svn: 205453
*	[PowerPC] Add some missing VSX bitcast patterns	Hal Finkel	2014-04-01	1	-0/+8
\| \| \| \|	llvm-svn: 205352
*	[PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles	Hal Finkel	2014-03-31	2	-0/+14
\| \| \| \| \| \| \|	If we have two unique values for a v2i64 build vector, this will always result in two vector loads if we expand using shuffles. Only one is necessary. llvm-svn: 205231
*	[PowerPC] Correct P7 dispatch unit allocation for vector instructions	Hal Finkel	2014-03-31	1	-16/+8
\| \| \| \|	llvm-svn: 205222
*	[PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG	Hal Finkel	2014-03-30	3	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by converting two and from v2i64. The small trick necessary here is to shift the i32 elements into the right lanes before the i32 -> f64 step. This is because of the big Endian nature of the system, we need the i32 portion in the high word of the i64 elements. For v2i16 and v2i8 we can do the same, but we first use the default Altivec shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and then apply the above procedure. llvm-svn: 205146
*	[PowerPC] Handle v2i64 comparisons	Hal Finkel	2014-03-29	1	-0/+23
\| \| \| \| \| \| \| \|	v2i64 is a legal type under VSX, however we don't have native vector comparisons. We can handle eq/ne by casting it to an Altivec type, but everything else must be expanded. llvm-svn: 205106
*	[PowerPC] VSX instruction latency corrections	Hal Finkel	2014-03-29	2	-15/+15
\| \| \| \| \| \| \| \|	The vector divide and sqrt instructions have high latencies, and the scalar comparisons are like all of the others. On the P7, permutations take an extra cycle over purely-simple vector ops. llvm-svn: 205096
*	Completely rewrite ELFObjectWriter::RecordRelocation.	Rafael Espindola	2014-03-29	2	-57/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I started trying to fix a small issue, but this code has seen a small fix too many. The old code was fairly convoluted. Some of the issues it had: * It failed to check if a symbol difference was in the some section when converting a relocation to pcrel. * It failed to check if the relocation was already pcrel. * The pcrel value computation was wrong in some cases (relocation-pc.s) * It was missing quiet a few cases where it should not convert symbol relocations to section relocations, leaving the backends to patch it up. * It would not propagate the fact that it had changed a relocation to pcrel, requiring a quiet nasty work around in ARM. * It was missing comments. llvm-svn: 205076
*	[PowerPC] Add subregister classes for f64 VSX values	Hal Finkel	2014-03-29	8	-59/+192
\| \| \| \| \| \| \| \| \| \| \| \| \|	We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte values even through we almost always had scalar 8-byte values. This resulted in an increase in stack-size use, extra memory bandwidth, etc. To fix this, I've added 64-bit subregisters of the Altivec registers, and combined those with the existing scalar floating-point registers to form a class of VSX scalar floating-point registers. The ABI code has also been enhanced to use this register class and some other necessary improvements have been made. llvm-svn: 205075
*	[PowerPC] Fix VSX permutation isel	Hal Finkel	2014-03-28	1	-1/+1
\| \| \| \| \| \| \|	Not only did I invert the indices when I wrote the code, but I also did the same thing when I wrote the regression test. Oops. llvm-svn: 205046
*	[PowerPC] v2[fi]64 need to be explicitly passed in VSX registers	Hal Finkel	2014-03-28	2	-7/+36
\| \| \| \| \| \| \| \|	v2[fi]64 values need to be explicitly passed in VSX registers. This is because the code in TRI that finds the minimal register class given a register and a value type will assert if given an Altivec register and a non-Altivec type. llvm-svn: 205041
*	[PowerPC] Use a small cleanup pass to remove VSX self copies	Hal Finkel	2014-03-27	3	-0/+78
\| \| \| \| \| \| \| \| \| \| \| \|	As explained in r204976, because of how the allocation of VSX registers interacts with the call-lowering code, we sometimes end up generating self VSX copies. Specifically, things like this: %VSL2<def> = COPY %F2, %VSL2<imp-use,kill> (where %F2 is really a sub-register of %VSL2, and so this copy is a nop) This adds a small cleanup pass to remove these prior to post-RA scheduling. llvm-svn: 204980
*	[PowerPC] Don't remove self VSX copies in PPCInstrInfo::copyPhysReg	Hal Finkel	2014-03-27	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because of how the allocation of VSX registers interacts with the call-lowering code, we sometimes end up generating self VSX copies. Specifically, things like this: %VSL2<def> = COPY %F2, %VSL2<imp-use,kill> (where %F2 is really a sub-register of %VSL2, and so this copy is a nop) The problem is that ExpandPostRAPseudos always assumes that some instruction has been inserted, and adds implicit defs to it. This is a problem if no copy was inserted because it can cause subtle problems during post-RA scheduling. These self copies will have to be removed some other way. llvm-svn: 204976
*	[PowerPC] Fix v2f64 vector extract and related patterns	Hal Finkel	2014-03-27	2	-4/+4
\| \| \| \| \| \| \| \| \|	First, v2f64 vector extract had not been declared legal (and so the existing patterns were not being used). Second, the patterns for that, and for scalar_to_vector, should really be a regclass copy, not a subregister operation, because the VSX registers directly hold both the vector and scalar data. llvm-svn: 204971
*	[PowerPC] Expand v2i64 shifts	Hal Finkel	2014-03-27	1	-0/+4
\| \| \| \| \| \| \| \|	These operations need to be expanded during legalization so that isel does not crash. In theory, we might be able to custom lower some of these. That, however, would need to be follow-up work. llvm-svn: 204963
*	Remove another unused argument.	Rafael Espindola	2014-03-27	1	-3/+2
\| \| \| \|	llvm-svn: 204961
*	Remove unused argument.	Rafael Espindola	2014-03-27	1	-5/+3
\| \| \| \|	llvm-svn: 204956
*	Prevent alias from pointing to weak aliases.	Rafael Espindola	2014-03-27	3	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds back r204781. Original message: Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204934
*	[PowerPC] Generate VSX permutations for v2[fi]64 vectors	Hal Finkel	2014-03-26	3	-5/+45
\| \| \| \|	llvm-svn: 204873
*	[PowerPC] VSX loads and stores support unaligned access	Hal Finkel	2014-03-26	2	-3/+10
\| \| \| \| \| \| \|	I've not yet updated PPCTTI because I'm not sure what the actual relative cost is compared to the aligned uses. llvm-svn: 204848
*	[PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions	Hal Finkel	2014-03-26	2	-4/+13
\| \| \| \|	llvm-svn: 204843
*	[PowerPC] Remove some dead VSX v4f32 store patterns	Hal Finkel	2014-03-26	1	-4/+2
\| \| \| \| \| \| \| \| \|	These patterns are dead (because v4f32 stores are currently promoted to v4i32 and stored using Altivec instructions), and also are likely not correct (because they'd store the vector elements in the opposite order from that assumed by the rest of the Altivec code). llvm-svn: 204839
*	[PowerPC] Use VSX vector load/stores for v2[fi]64	Hal Finkel	2014-03-26	2	-0/+13
\| \| \| \| \| \| \| \|	These instructions have access to the complete VSX register file. In addition, they "swap" the order of the elements so that element 0 (the scalar part) comes first in memory and element 1 follows at a higher address. llvm-svn: 204838
*	[PowerPC] Add v2i64 as a legal VSX type	Hal Finkel	2014-03-26	4	-9/+32
\| \| \| \| \| \| \| \| \|	v2i64 needs to be a legal VSX type because it is the SetCC result type from v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations. This fixes the lowering for v2f64 VSELECT. llvm-svn: 204828
*	[PowerPC] Lower VSELECT using xxsel when VSX is available	Hal Finkel	2014-03-26	2	-3/+22
\| \| \| \| \| \| \| \|	With VSX there is a real vector select instruction, and so we should use it. Note that VSELECT will still scalarize for v2f64 because the corresponding SetCC result type (v2i64) is not currently a legal type. llvm-svn: 204801
*	Revert "Prevent alias from pointing to weak aliases."	Rafael Espindola	2014-03-26	3	-9/+9
\| \| \| \| \| \| \| \| \|	This reverts commit r204781. I will follow up to with msan folks to see what is what they were trying to do with aliases to weak aliases. llvm-svn: 204784
*	[PowerPC] Generate logical vector VSX instructions	Hal Finkel	2014-03-26	1	-5/+12
\| \| \| \| \| \| \|	These instructions are essentially the same as their Altivec counterparts, but have access to the larger VSX register file. llvm-svn: 204782
*	Prevent alias from pointing to weak aliases.	Rafael Espindola	2014-03-26	3	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204781
*	[PowerPC] Select between VSX A-type and M-type FMA instructions just before RA	Hal Finkel	2014-03-25	3	-0/+283
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The VSX instruction set has two types of FMA instructions: A-type (where the addend is taken from the output register) and M-type (where one of the product operands is taken from the output register). This adds a small pass that runs just after MI scheduling (and, thus, just before register allocation) that mutates A-type instructions (that are created during isel) into M-type instructions when: 1. This will eliminate an otherwise-necessary copy of the addend 2. One of the product operands is killed by the instruction The "right" moment to make this decision is in between scheduling and register allocation, because only there do we know whether or not one of the product operands is killed by any particular instruction. Unfortunately, this also makes the implementation somewhat complicated, because the MIs are not in SSA form and we need to preserve the LiveIntervals analysis. As a simple example, if we have: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9 %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16, %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16 ... %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19, %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19 ... We can eliminate the copy by changing from the A-type to the M-type instruction. This means: %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16, %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16 is replaced by: %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9, %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9 and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9 llvm-svn: 204768
*	[PowerPC] Correct commutable indices for VSX FMA instructions	Hal Finkel	2014-03-25	2	-0/+18
\| \| \| \| \| \| \| \| \| \|	Although the first two operands are the ones that can be swapped, the tied input operand is listed before them, so we need to adjust for that. I have a test case for this, but it goes along with an upcoming commit (so it will come soon). llvm-svn: 204748
*	[PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructions	Hal Finkel	2014-03-25	2	-27/+103
\| \| \| \| \| \| \|	TableGen will create a lookup table for the A-type FMA instructions providing their corresponding M-form opcodes. This will be used by upcoming commits. llvm-svn: 204746
*	[PowerPC] Generate little-endian object files	Ulrich Weigand	2014-03-24	5	-24/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a first step towards real little-endian code generation, this patch changes the PowerPC MC layer to actually generate little-endian object files. This involves passing the little-endian flag through the various layers, including down to createELFObjectWriter so we actually get basic little-endian ELF objects, emitting instructions in little-endian order, and handling fixups and relocations as appropriate for little-endian. The bulk of the patch is to update most test cases in test/MC/PowerPC to verify both big- and little-endian encodings. (The only test cases not updated are those that create actual big-endian ABI code, like the TLS tests.) Note that while the object files are now little-endian, the generated code itself is not yet updated, in particular, it still does not adhere to the ELFv2 ABI. llvm-svn: 204634
*	[PPC64LE] ELFv2 ABI updates for the .opd section	Will Schmidt	2014-03-24	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[PPC64LE] ELFv2 ABI updates for the .opd section The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as such, does not have a ".opd" section. This is keyed off a _CALL_ELF=2 macro check. The CALL_ELF check is not clearly documented at this time. The basis for usage in this patch is from the gcc thread here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html > Adding comment from Uli: Looks good to me. I think the old-style JIT doesn't really work anyway for 64-bit, but at least with this patch LLVM will compile and link again on a ppc64le host ... llvm-svn: 204614
*	[PowerPC] Mark many instructions as commutative	Hal Finkel	2014-03-24	4	-4/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm under the impression that we used to infer the isCommutable flag from the instruction-associated pattern. Regardless, we don't seem to do this (at least by default) any more. I've gone through all of our instruction definitions, and marked as commutative all of those that should be trivial to commute (by exchanging the first two operands). There has been special code for the RL* instructions, and that's not changed. Before this change, we had the following commutative instructions: RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo XSADDDP XSMULDP XVADDDP XVADDSP XVMULDP XVMULSP After: ADD4 ADD4o ADD8 ADD8o ADDC ADDC8 ADDC8o ADDCo ADDE ADDE8 ADDE8o ADDEo AND AND8 AND8o ANDo CRAND CREQV CRNAND CRNOR CROR CRXOR EQV EQV8 EQV8o EQVo FADD FADDS FADDSo FADDo FMADD FMADDS FMADDSo FMADDo FMSUB FMSUBS FMSUBSo FMSUBo FMUL FMULS FMULSo FMULo FNMADD FNMADDS FNMADDSo FNMADDo FNMSUB FNMSUBS FNMSUBSo FNMSUBo MULHD MULHDU MULHDUo MULHDo MULHW MULHWU MULHWUo MULHWo MULLD MULLDo MULLW MULLWo NAND NAND8 NAND8o NANDo NOR NOR8 NOR8o NORo OR OR8 OR8o ORo RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo VADDCUW VADDFP VADDSBS VADDSHS VADDSWS VADDUBM VADDUBS VADDUHM VADDUHS VADDUWM VADDUWS VAND VAVGSB VAVGSH VAVGSW VAVGUB VAVGUH VAVGUW VMADDFP VMAXFP VMAXSB VMAXSH VMAXSW VMAXUB VMAXUH VMAXUW VMHADDSHS VMHRADDSHS VMINFP VMINSB VMINSH VMINSW VMINUB VMINUH VMINUW VMLADDUHM VMULESB VMULESH VMULEUB VMULEUH VMULOSB VMULOSH VMULOUB VMULOUH VNMSUBFP VOR VXOR XOR XOR8 XOR8o XORo XSADDDP XSMADDADP XSMAXDP XSMINDP XSMSUBADP XSMULDP XSNMADDADP XSNMSUBADP XVADDDP XVADDSP XVMADDADP XVMADDASP XVMAXDP XVMAXSP XVMINDP XVMINSP XVMSUBADP XVMSUBASP XVMULDP XVMULSP XVNMADDADP XVNMADDASP XVNMSUBADP XVNMSUBASP XXLAND XXLNOR XXLOR XXLXOR This is a by-inspection change, and I'm not sure how to write a reliable test case. I would like advice on this, however. llvm-svn: 204609
*	[PowerPC] Don't schedule VSX copy legalization unless VSX is enabled	Hal Finkel	2014-03-24	1	-1/+2
\| \| \| \| \| \|	There is no need to schedule this extra pass if it will have nothing to do. llvm-svn: 204594
*	[PowerPC] Update comment re: VSX copy-instruction selection	Hal Finkel	2014-03-24	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've done some experimentation with this, and it looks like using the lower-latency (but lower throughput) copy instruction is essentially always the right thing to do. My assumption is that, in order to be relatively sure that the higher-latency copy will increase throughput, we'd want to have it unlikely to be in-flight with its use. On the P7, the global completion table (GCT) can hold a maximum of 120 instructions, shared among all active threads (up to 4), giving 30 instructions per thread. So specifically, I'd require at least that many instructions between the copy and the use before the high-latency variant is used. Trying this, however, over the entire test suite resulted in zero cases where the high-latency form would be preferable. This may be a consequence of the fact that the scheduler views copies as free, and so they tend to end up close to their uses. For this experiment I created a function: unsigned chooseVSXCopy(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, unsigned DestReg, unsigned SrcReg, unsigned StartDist = 1, unsigned Depth = 3) const; with an implementation like: if (!Depth) return PPC::XXLOR; const unsigned MaxDist = 30; unsigned Dist = StartDist; for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) { if (J->isTransient() && !J->isCopy()) continue; if (J->isCall() \|\| J->isReturn() \|\| J->readsRegister(DestReg, TRI)) return PPC::XXLOR; ++Dist; } // We've exceeded the required distance for the high-latency form, use it. if (Dist > MaxDist) return PPC::XVCPSGNDP; // If this is only an exit block, use the low-latency form. if (MBB.succ_empty()) return PPC::XXLOR; // We've reached the end of the block, check the successor blocks (up to some // depth), and use the high-latency form if that is okay with all successors. for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) { if (chooseVSXCopy(*J, (J)->begin(), DestReg, SrcReg, Dist, --Depth) == PPC::XXLOR) return PPC::XXLOR; } // All of our successor blocks seem okay with the high-latency variant, so // we'll use it. return PPC::XVCPSGNDP; and then changed the copy opcode selection from: Opc = PPC::XXLOR; to: Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg); In conclusion, I'm removing the FIXME from the comment, because I believe that there is, at least absent other examples, nothing to fix. llvm-svn: 204591
*	remove a bunch of unused private methods	Nuno Lopes	2014-03-23	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	found with a smarter version of -Wunused-member-function that I'm playwing with. Appologies in advance if I removed someone's WIP code. include/llvm/CodeGen/MachineSSAUpdater.h \| 1 include/llvm/IR/DebugInfo.h \| 3 lib/CodeGen/MachineSSAUpdater.cpp \| 10 -- lib/CodeGen/PostRASchedulerList.cpp \| 1 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp \| 10 -- lib/IR/DebugInfo.cpp \| 12 -- lib/MC/MCAsmStreamer.cpp \| 2 lib/Support/YAMLParser.cpp \| 39 --------- lib/TableGen/TGParser.cpp \| 16 --- lib/TableGen/TGParser.h \| 1 lib/Target/AArch64/AArch64TargetTransformInfo.cpp \| 9 -- lib/Target/ARM/ARMCodeEmitter.cpp \| 12 -- lib/Target/ARM/ARMFastISel.cpp \| 84 -------------------- lib/Target/Mips/MipsCodeEmitter.cpp \| 11 -- lib/Target/Mips/MipsConstantIslandPass.cpp \| 12 -- lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp \| 21 ----- lib/Target/NVPTX/NVPTXISelDAGToDAG.h \| 2 lib/Target/PowerPC/PPCFastISel.cpp \| 1 lib/Transforms/Instrumentation/AddressSanitizer.cpp \| 2 lib/Transforms/Instrumentation/BoundsChecking.cpp \| 2 lib/Transforms/Instrumentation/MemorySanitizer.cpp \| 1 lib/Transforms/Scalar/LoopIdiomRecognize.cpp \| 8 - lib/Transforms/Scalar/SCCP.cpp \| 1 utils/TableGen/CodeEmitterGen.cpp \| 2 24 files changed, 2 insertions(+), 261 deletions(-) llvm-svn: 204560
*	[PowerPC] Make use of VSX f64 <-> i64 conversion instructions	Hal Finkel	2014-03-23	1	-6/+12
\| \| \| \| \| \| \|	When VSX is available, these instructions should be used in preference to the older variants that only have access to the scalar floating-point registers. llvm-svn: 204559
*	[PowerPC] Fix the VSX v2f64 return register	Hal Finkel	2014-03-22	1	-4/+4
\| \| \| \| \| \| \|	v2f64 values, like other 128-bit values, are returned under VSX in register vs34 (Altivec register v2). llvm-svn: 204543