bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Reduce verbiage of lit.local.cfg files	Alp Toker	2014-06-09	1	-2/+1
\| \| \| \| \| \|	We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496
*	[PPC64LE] Generate correct code for unaligned little-endian vector loads	Bill Schmidt	2014-06-09	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in PPCTargetLowering::PerformDAGCombine() that handles unaligned Altivec vector loads generates a lvsl followed by a vperm. As we've seen in numerous other places, the vperm instruction has a big-endian bias, and this is fixed for little endian by complementing the permute control vector and swapping the input operands. In this case the lvsl is providing the permute control vector. Rather than generating an lvsl and a complement operation, it is sufficient to generate an lvsr instruction instead. Thus for LE code generation we will generate an lvsr rather than an lvsl, and swap the other input arguments on the vperm. The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test the code generation for PPC64 and PPC64LE, in addition to the existing PPC32/G5 testing. llvm-svn: 210493
*	[PPC64LE] Generate correct little-endian code for v16i8 multiply	Bill Schmidt	2014-06-09	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing code in PPCTargetLowering::LowerMUL() for multiplying two v16i8 values assumes that vector elements are numbered in big-endian order. For little-endian targets, the vector element numbering is reversed, but the vmuleub, vmuloub, and vperm instructions still assume big-endian numbering. To account for this, we must adjust the permute control vector and reverse the order of the input registers on the vperm instruction. The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed on powerpc64 and powerpc64le targets as well as the original powerpc (32-bit) target. llvm-svn: 210474
*	[PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian	Bill Schmidt	2014-06-06	1	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a couple of lowering issues for little endian PowerPC. The code for lowering BUILD_VECTOR contains a number of optimizations that are only valid for big endian. For now, we disable those optimizations for correctness. In the future, we will add analogous optimizations that are correct for little endian. When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to make the now-familiar transformation of swapping the input operands and complementing the permute control vector. Correctness of this transformation is tested by the accompanying test case. llvm-svn: 210336
*	[PPC64LE] Add test case for r210282 commit	Bill Schmidt	2014-06-05	1	-0/+17
\| \| \| \| \| \| \| \|	Chandler correctly pointed out that I need an LLVM IR test for r210282, which modified the vperm -> shuffle transform for little endian PowerPC. This patch provides that test. llvm-svn: 210297
*	[PPC] Use alias symbols in address computation.	Rafael Espindola	2014-05-29	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \|	This seems to match what gcc does for ppc and what every other llvm backend does. This is a fixed version of r209638. The difference is to avoid any change in behavior for functions. The logic for using constant pools for function addresseses is spread over a few places and we have to keep them in sync. llvm-svn: 209821
*	Add a test showing the ppc code sequence for getting a function pointer.	Rafael Espindola	2014-05-29	1	-0/+21
\| \| \| \| \| \|	This would have found the miscompile in r209638. llvm-svn: 209820
*	Revert "[PPC] Use alias symbols in address computation."	Hal Finkel	2014-05-28	1	-31/+0
\| \| \| \| \| \| \| \| \|	This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the Clang-compiled TableGen would segfault because it jumped to an invalid address from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E (which is within the command-line parameter registration process)). llvm-svn: 209745
*	[PATCH] Correct type used for VADD_SPLAT optimization on PowerPC	Bill Schmidt	2014-05-27	2	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there is an optimization for certain patterns to generate one or two vector splats followed by a vector add or subtract. This operation is represented by a VADD_SPLAT in the selection DAG. Prior to this patch, it was possible for the VADD_SPLAT to be assigned the wrong data type, causing incorrect code generation. This patch corrects the problem. Specifically, the code previously assigned the value type of the BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is correct much of the time, but not always. The problem is that the call to isConstantSplat() may return a SplatBitSize that is not the same as the number of bits in the original element vector type. The correct type to assign is a vector type with the same element bit size as SplatBitSize. The included test case shows an example of this, where the BUILD_VECTOR node has a type of v16i8. The vector to be built is {0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat detects that we can generate a splat of 16 for type v8i16, which is the type we must assign to the VADD_SPLAT node. If we do not, we generate a vspltisb of 8 and a vaddubm, which generates the incorrect result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16}. The correct code generation is a vspltish of 8 and a vadduhm. This patch also corrected code generation for CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked as an XFAIL, so we can remove the XFAIL from the test case. llvm-svn: 209662
*	[PPC] Use alias symbols in address computation.	Rafael Espindola	2014-05-26	1	-0/+31
\| \| \| \| \| \| \|	This seems to match what gcc does for ppc and what every other llvm backend does. llvm-svn: 209638
*	[PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate	Adam Nemet	2014-05-20	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64 backend. We matched an immediate offset with STWX8 even though it only supports register offset. The culprit is the complex-pattern predicate, SelectAddrIdx, which decides that if the offset is not ISD::Constant it must be a register. Many thanks to Bill Schmidt for testing this. llvm-svn: 209219
*	DebugInfo: Sure up subprogram variable list handling with more assertions ↵	David Blaikie	2014-05-14	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	and fewer conditionals. Many old tests using prior schemas still had some brokenness here (both indirect arrays and arrays with single bogus elements). Fixed those up so they don't hit the new assertions. Also reduced nesting in some places, etc. llvm-svn: 208817
*	[PowerPC] Add global named register support	Hal Finkel	2014-05-11	7	-0/+124
\| \| \| \| \| \| \|	Support for the intrinsics that read from and write to global named registers is added for r1, r2 and r13 (depending on the subtarget). llvm-svn: 208509
*	[PowerPC] On PPC32, 128-bit shifts might be runtime calls	Hal Finkel	2014-05-11	1	-0/+72
\| \| \| \| \| \| \| \| \| \| \|	The counter-loops formation pass needs to know what operations might be function calls (because they can't appear in counter-based loops). On PPC32, 128-bit shifts might be runtime calls (even though you can't use __int128 on PPC32, it seems that SROA might form them). Fixes PR19709. llvm-svn: 208501
*	[PowerPC] Fix rlwimi isel when mask is not constant	Hal Finkel	2014-04-13	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had been using the known-zero values of the operand of the or to construct the mask for an rlwimi; this is not quite correct, but fine when the mask is constant. When the mask is constant, then the known zeros of the operand must be a superset of the zeros in the mask. However, when the mask is not a constant, then there might be bits in the operand that are not known to be zero that, at runtime, might be zero in the mask. Therefore, we check that any bits not known to be zero are known to be one in the mask. Otherwise, we can't fold the mask with the or and shift. This was revealed as a miscompile of MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with constant hoisting. llvm-svn: 206136
*	[PowerPC] Implement some additional TLI callbacks	Hal Finkel	2014-04-12	5	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add implementations of: bool isLegalICmpImmediate(int64_t Imm) const bool isLegalAddImmediate(int64_t Imm) const bool isTruncateFree(Type Ty1, Type Ty2) const bool isTruncateFree(EVT VT1, EVT VT2) const bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const Unfortunately, this regresses counter-register-based loop formation because some of the loops now end up in forms were SE cannot compute loop counts. However, nevertheless, the test-suite results favor committing: SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown llvm-svn: 206120
*	Reenable use of TBAA during CodeGen	Hal Finkel	2014-04-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had disabled use of TBAA during CodeGen (even when otherwise using AA) because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to miss basic type punning that it should catch (and, thus, we'd fail to override TBAA when we should). However, when AA is in use during CodeGen, CGP now uses normal GEPs and bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a result, BasicAA should be able to make us do the right thing in the face of type-punning, and it seems safe to enable use of TBAA again. self-hosting seems fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle. Note: We still don't update TBAA when merging stack slots, although because BasicAA should now catch all such cases, this is no longer a blocking issue. Nevertheless, I plan to commit code to deal with this properly in the near future. llvm-svn: 206093
*	Add the ability to use GEPs for address sinking in CGP	Hal Finkel	2014-04-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current memory-instruction optimization logic in CGP, which sinks parts of the address computation that can be adsorbed by the addressing mode, does this by explicitly converting the relevant part of the address computation into IR-level integer operations (making use of ptrtoint and inttoptr). For most targets this is currently not a problem, but for targets wishing to make use of IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a problem for two reasons: 1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr 2. In cases where type-punning was used, and BasicAA was used to override TBAA, BasicAA may no longer do so. (this had forced us to disable all use of TBAA in CodeGen; something which we can now enable again) This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by default (except for those targets that use AA during CodeGen), and so aside from some PowerPC subtargets and SystemZ, there should be no change in behavior. We may be able to switch completely away from the ptrtoint/inttoptr sinking on all targets, but further testing is required. I've doubled-up on a number of existing tests that are sensitive to the address sinking behavior (including some store-merging tests that are sensitive to the order of the resulting ADD operations at the SDAG level). llvm-svn: 206092
*	[PowerPC] Add a full condition code register to make the "cc" clobber work	Hal Finkel	2014-04-04	1	-0/+70
\| \| \| \| \| \| \| \|	gcc inline asm supports specifying "cc" as a clobber of all condition registers. Add just enough modeling of the full register to make this work. Fixed PR19326. llvm-svn: 205630
*	[PowerPC] Add some missing VSX bitcast patterns	Hal Finkel	2014-04-01	1	-0/+8
\| \| \| \|	llvm-svn: 205352
*	[PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles	Hal Finkel	2014-03-31	1	-5/+6
\| \| \| \| \| \| \|	If we have two unique values for a v2i64 build vector, this will always result in two vector loads if we expand using shuffles. Only one is necessary. llvm-svn: 205231
*	Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELT	Hal Finkel	2014-03-31	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the loop vectorizer vectorizes code that uses the loop induction variable, we often end up with IR like this: %b1 = insertelement <2 x i32> undef, i32 %v, i32 0 %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer %i = add <2 x i32> %b2, <i32 2, i32 3> If the add in this example is not legal (as is the case on PPC with VSX), it will be scalarized, and we'll end up with a number of extract_vector_elt nodes with the vector shuffle as the input operand, and that vector shuffle is fed by one or more build_vector nodes. By the time that vector operations are expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by looking through the vector shuffle (to make sure that no illegal operations are created), and so the extract_vector_elt -> vector shuffle -> build_vector is never simplified to an operand of the build vector. By looking at build_vectors through a shuffle we fix this particular situation, preventing a vector from being built, only to be deconstructed again (for the scalarized add) -- an expensive proposition when this all needs to be done via the stack. We probably want a more comprehensive fix here where we look back recursively through any shuffles to any build_vectors or scalar_to_vectors, etc. but that can come later. llvm-svn: 205179
*	Make use of previously generated stores in ↵	Hal Finkel	2014-03-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAGLegalize::ExpandExtractFromVectorThroughStack When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire vector and then load the piece we want. This is fine in isolation, but generating a new store (and corresponding stack slot) for each extraction ends up producing code of poor quality. When we scalarize a vector operation (using SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT for each element in the vector. This used to generate one stored copy of the vector for each element in the vector. Now we search the uses of the vector for a suitable store before generating a new one, which results in much more efficient scalarization code. llvm-svn: 205153
*	[PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG	Hal Finkel	2014-03-30	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by converting two and from v2i64. The small trick necessary here is to shift the i32 elements into the right lanes before the i32 -> f64 step. This is because of the big Endian nature of the system, we need the i32 portion in the high word of the i64 elements. For v2i16 and v2i8 we can do the same, but we first use the default Altivec shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and then apply the above procedure. llvm-svn: 205146
*	[PowerPC] Handle v2i64 comparisons	Hal Finkel	2014-03-29	1	-0/+33
\| \| \| \| \| \| \| \|	v2i64 is a legal type under VSX, however we don't have native vector comparisons. We can handle eq/ne by casting it to an Altivec type, but everything else must be expanded. llvm-svn: 205106
*	[PowerPC] Add subregister classes for f64 VSX values	Hal Finkel	2014-03-29	2	-3/+166
\| \| \| \| \| \| \| \| \| \| \| \| \|	We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte values even through we almost always had scalar 8-byte values. This resulted in an increase in stack-size use, extra memory bandwidth, etc. To fix this, I've added 64-bit subregisters of the Altivec registers, and combined those with the existing scalar floating-point registers to form a class of VSX scalar floating-point registers. The ABI code has also been enhanced to use this register class and some other necessary improvements have been made. llvm-svn: 205075
*	[PowerPC] Fix VSX permutation isel	Hal Finkel	2014-03-28	1	-1/+1
\| \| \| \| \| \| \|	Not only did I invert the indices when I wrote the code, but I also did the same thing when I wrote the regression test. Oops. llvm-svn: 205046
*	[PowerPC] v2[fi]64 need to be explicitly passed in VSX registers	Hal Finkel	2014-03-28	1	-0/+26
\| \| \| \| \| \| \| \|	v2[fi]64 values need to be explicitly passed in VSX registers. This is because the code in TRI that finds the minimal register class given a register and a value type will assert if given an Altivec register and a non-Altivec type. llvm-svn: 205041
*	[PowerPC] Use a small cleanup pass to remove VSX self copies	Hal Finkel	2014-03-27	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \|	As explained in r204976, because of how the allocation of VSX registers interacts with the call-lowering code, we sometimes end up generating self VSX copies. Specifically, things like this: %VSL2<def> = COPY %F2, %VSL2<imp-use,kill> (where %F2 is really a sub-register of %VSL2, and so this copy is a nop) This adds a small cleanup pass to remove these prior to post-RA scheduling. llvm-svn: 204980
*	[PowerPC] Fix v2f64 vector extract and related patterns	Hal Finkel	2014-03-27	1	-0/+18
\| \| \| \| \| \| \| \| \|	First, v2f64 vector extract had not been declared legal (and so the existing patterns were not being used). Second, the patterns for that, and for scalar_to_vector, should really be a regclass copy, not a subregister operation, because the VSX registers directly hold both the vector and scalar data. llvm-svn: 204971
*	[PowerPC] Expand v2i64 shifts	Hal Finkel	2014-03-27	1	-0/+42
\| \| \| \| \| \| \| \|	These operations need to be expanded during legalization so that isel does not crash. In theory, we might be able to custom lower some of these. That, however, would need to be follow-up work. llvm-svn: 204963
*	[PowerPC] Generate VSX permutations for v2[fi]64 vectors	Hal Finkel	2014-03-26	1	-0/+65
\| \| \| \|	llvm-svn: 204873
*	[PowerPC] VSX loads and stores support unaligned access	Hal Finkel	2014-03-26	1	-0/+18
\| \| \| \| \| \| \|	I've not yet updated PPCTTI because I'm not sure what the actual relative cost is compared to the aligned uses. llvm-svn: 204848
*	[PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions	Hal Finkel	2014-03-26	1	-0/+72
\| \| \| \|	llvm-svn: 204843
*	[PowerPC] Use VSX vector load/stores for v2[fi]64	Hal Finkel	2014-03-26	1	-0/+36
\| \| \| \| \| \| \| \|	These instructions have access to the complete VSX register file. In addition, they "swap" the order of the elements so that element 0 (the scalar part) comes first in memory and element 1 follows at a higher address. llvm-svn: 204838
*	[PowerPC] Add v2i64 as a legal VSX type	Hal Finkel	2014-03-26	1	-1/+22
\| \| \| \| \| \| \| \| \|	v2i64 needs to be a legal VSX type because it is the SetCC result type from v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations. This fixes the lowering for v2f64 VSELECT. llvm-svn: 204828
*	[PowerPC] Lower VSELECT using xxsel when VSX is available	Hal Finkel	2014-03-26	1	-0/+77
\| \| \| \| \| \| \| \|	With VSX there is a real vector select instruction, and so we should use it. Note that VSELECT will still scalarize for v2f64 because the corresponding SetCC result type (v2i64) is not currently a legal type. llvm-svn: 204801
*	[PowerPC] Generate logical vector VSX instructions	Hal Finkel	2014-03-26	1	-0/+156
\| \| \| \| \| \| \|	These instructions are essentially the same as their Altivec counterparts, but have access to the larger VSX register file. llvm-svn: 204782
*	[PowerPC] Select between VSX A-type and M-type FMA instructions just before RA	Hal Finkel	2014-03-25	1	-0/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The VSX instruction set has two types of FMA instructions: A-type (where the addend is taken from the output register) and M-type (where one of the product operands is taken from the output register). This adds a small pass that runs just after MI scheduling (and, thus, just before register allocation) that mutates A-type instructions (that are created during isel) into M-type instructions when: 1. This will eliminate an otherwise-necessary copy of the addend 2. One of the product operands is killed by the instruction The "right" moment to make this decision is in between scheduling and register allocation, because only there do we know whether or not one of the product operands is killed by any particular instruction. Unfortunately, this also makes the implementation somewhat complicated, because the MIs are not in SSA form and we need to preserve the LiveIntervals analysis. As a simple example, if we have: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9 %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16, %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16 ... %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19, %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19 ... We can eliminate the copy by changing from the A-type to the M-type instruction. This means: %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16, %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16 is replaced by: %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9, %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9 and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9 llvm-svn: 204768
*	[PowerPC] Make use of VSX f64 <-> i64 conversion instructions	Hal Finkel	2014-03-23	3	-0/+99
\| \| \| \| \| \| \|	When VSX is available, these instructions should be used in preference to the older variants that only have access to the scalar floating-point registers. llvm-svn: 204559
*	[PowerPC] Fix the VSX v2f64 return register	Hal Finkel	2014-03-22	1	-3/+1
\| \| \| \| \| \| \|	v2f64 values, like other 128-bit values, are returned under VSX in register vs34 (Altivec register v2). llvm-svn: 204543
*	Remove redundant test.	Rafael Espindola	2014-03-21	1	-9/+0
\| \| \| \| \| \|	This is tested from MC already. llvm-svn: 204491
*	Fix PR19144: Incorrect offset generated for int-to-fp conversion at -O0.	Bill Schmidt	2014-03-18	1	-0/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When converting a signed 32-bit integer to double-precision floating point on hardware without a lfiwax instruction, we have to instead use a lfd followed by fcfid. We were erroneously offsetting the address by 4 bytes in preparation for either a lfiwax or lfiwzx when generating the lfd. This fixes that silly error. This was not caught in the test suite since the conversion tests were run with -mcpu=pwr7, which implies availability of lfiwax. I've added another test case for older hardware that checks the code we expect in the absence of lfiwax and other flavors of fcfid. There are fewer tests in this test case because we punt to DAG selection in more cases on older hardware. (We must generate complex fiddly sequences in those cases, and there is marginal benefit in duplicating that logic in fast-isel.) llvm-svn: 204155
*	[ppc64] Avoid copy relocs in named rodata sections	Ulrich Weigand	2014-03-14	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit r181723 introduced code to avoid placing initialized variables needing relocations into the .rodata section, which avoid copy relocs that do not work as expected on ppc64 function references. The same treatment is also needed for named .rodata.XXX sections. This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal to modify "Kind" before calling the default SelectSectionForGlobal routine, instead of first calling the default routine and then just checking for the (main) .rodata section afterwards. llvm-svn: 203921
*	Remove the linker_private and linker_private_weak linkages.	Rafael Espindola	2014-03-13	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These linkages were introduced some time ago, but it was never very clear what exactly their semantics were or what they should be used for. Some investigation found these uses: * utf-16 strings in clang. * non-unnamed_addr strings produced by the sanitizers. It turns out they were just working around a more fundamental problem. For some sections a MachO linker needs a symbol in order to split the section into atoms, and llvm had no idea that was the case. I fixed that in r201700 and it is now safe to use the private linkage. When the object ends up in a section that requires symbols, llvm will use a 'l' prefix instead of a 'L' prefix and things just work. With that, these linkages were already dead, but there was a potential future user in the objc metadata information. I am still looking at CGObjcMac.cpp, but at this point I am convinced that linker_private and linker_private_weak are not what they need. The objc uses are currently split in * Regular symbols (no '\01' prefix). LLVM already directly provides whatever semantics they need. * Uses of a private name (start with "\01L" or "\01l") and private linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm agrees with clang on L being ok or not for a given section. I have two patches in code review for this. * Uses of private name and weak linkage. The last case is the one that one could think would fit one of these linkages. That is not the case. The semantics are * the linker will merge these symbol by name. * the linker will hide them in the final DSO. Given that the merging is done by name, any of the private (or internal) linkages would be a bad match. They allow llvm to rename the symbols, and that is really not what we want. From the llvm point of view, these objects should really be (linkonce\|weak)(_odr)?. For now, just keeping the "\01l" prefix is probably the best for these symbols. If we one day want to have a more direct support in llvm, IMHO what we should add is not a linkage, it is just a hidden_symbol attribute. It would be applicable to multiple linkages. For example, on weak it would produce the current behavior we have for objc metadata. On internal, it would be equivalent to private (and we should then remove private). llvm-svn: 203866
*	[PowerPC] Initial support for the VSX instruction set	Hal Finkel	2014-03-13	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
*	IR: add a second ordering operand to cmpxhg for failure	Tim Northover	2014-03-11	4	-34/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559
*	Fixup PPC Darwin i1 argument handling	Hal Finkel	2014-03-06	1	-0/+5
\| \| \| \| \| \| \|	Like on other targets, we need to zero_extend/truncate i1 args before copying them to GPRs. llvm-svn: 203045
*	When using CR bit registers on PPC32, handle the i1 vaarg case	Hal Finkel	2014-03-06	1	-0/+15
\| \| \| \| \| \| \| \|	When copying an i1 value into a GPR for a vaarg call, we need to explicitly zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be generated). llvm-svn: 203041
*	With PPC CR bit registers, handle int_to_fp on older cores	Hal Finkel	2014-03-05	1	-0/+21
\| \| \| \| \| \| \| \| \|	On cores without fpcvt support, we cannot promote int_to_fp i1 operations, because there is nothing to promote them to. The most straightforward implementation of this uses a select to choose between the two possible resulting floating-point values (and that's what is done here). llvm-svn: 203015