bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced ↵	David Blaikie	2015-05-18	1	-6/+4
\| \| \| \| \| \|	init only llvm-svn: 237624
*	[PPC64] Add vector pack/unpack support from ISA 2.07	Bill Schmidt	2015-05-16	1	-2/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the following new instructions in the Power ISA 2.07: vpksdss vpksdus vpkudus vpkudum vupkhsw vupklsw These instructions are available through the vec_packs, vec_packsu, vec_unpackh, and vec_unpackl built-in interfaces. These are lane-sensitive instructions, so the built-ins have different implementations for big- and little-endian, and the instructions must be marked as killing the vector swap optimization for now. The first three instructions perform saturating pack operations. The fourth performs a modulo pack operation, which means it can be represented with a vector shuffle, and conversely the appropriate vector shuffles may cause this instruction to be generated. The other instructions are only generated via built-in support for now. Appropriate tests have been added. There is a companion patch to clang for the rest of this support. llvm-svn: 237499
*	Fix compile error	Arnold Schwaighofer	2015-05-09	1	-1/+1
\| \| \| \|	llvm-svn: 236921
*	ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not ↵	Arnold Schwaighofer	2015-05-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916
*	Change getTargetNodeName() to produce compiler warnings for missing cases, ↵	Matthias Braun	2015-05-07	1	-2/+10
\| \| \| \| \| \|	fix them llvm-svn: 236775
*	Add VSX Scalar loads and stores to the PPC back end	Nemanja Ivanovic	2015-05-07	1	-3/+18
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D9440 It adds a new register class to the PPC back end to contain single precision values in VSX registers. Additionally, it adds scalar loads and stores for VSX registers. llvm-svn: 236755
*	This patch adds ABI support for v1i128 data type.	Kit Barton	2015-05-05	1	-7/+22
\| \| \| \| \| \| \| \| \| \| \| \|	It adds v1i128 to the appropriate register classes and checks parameter passing and return values. This is related to http://reviews.llvm.org/D9081, which will add instructions that exploit the v1i128 datatype. Phabricator review: http://reviews.llvm.org/D9475 llvm-svn: 236503
*	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"	Sergey Dmitrouk	2015-04-28	1	-156/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
*	Revert "[DebugInfo] Add debug locations to constant SD nodes"	Daniel Jasper	2015-04-28	1	-162/+156
\| \| \| \| \| \| \|	This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
*	[DebugInfo] Add debug locations to constant SD nodes	Sergey Dmitrouk	2015-04-28	1	-156/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
*	Allow memory intrinsics to be tail calls	Krzysztof Parzyszek	2015-04-13	1	-2/+2
\| \| \| \|	llvm-svn: 234764
*	Add direct moves to/from VSR and exploit them for FP/INT conversions	Nemanja Ivanovic	2015-04-11	1	-0/+78
\| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D8928 It adds direct move instructions to/from VSX registers to GPR's. These are exploited for FP <-> INT conversions. llvm-svn: 234682
*	[PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores	Hal Finkel	2015-04-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	When we have an instruction for this (and, thus, don't generate a runtime call), we need to custom type legalize this (in a trivial way, just as we do for fp_to_sint). Fixes PR23173. llvm-svn: 234561
*	[PowerPC] Enable splat generation for BUILD_VECTOR with little endian	Bill Schmidt	2015-04-03	1	-33/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR nodes for little endian because wrong results were produced. I've subsequently investigated and found this is due to a call to BuildVectorSDNode::isConstantSplat that was always specifying big-endian. With this changed to correctly identify the target endianness, the optimizations work as expected. I found another case of a call to the same method with big-endian hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was an orphaned method with no callers, so I've just removed it. The existing test/CodeGen/PowerPC/vec_constants.ll checks these optimizations, so for testing I've just added a variant for little endian. llvm-svn: 234011
*	[PowerPC] Don't use a vector preferred memory type at -O0	Hal Finkel	2015-03-31	1	-15/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic is a memset/memcpy/etc. we might normally use vector types. At -O0, this is probably not a good idea (because, if there is a bug in the lowering code, there would be no good way to turn it off). At -O0, only use scalar preferred types. Related to PR22754. llvm-svn: 233755
*	Add Hardware Transactional Memory (HTM) Support	Kit Barton	2015-03-25	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07 (POWER8). The intrinsic support is based on GCC one [1], but currently only the 'PowerPC HTM Low Level Built-in Function' are implemented. The HTM instructions follows the RC ones and the transaction initiation result is set on RC0 (with exception of tcheck). Currently approach is to create a register copy from CR0 to GPR and comapring. Although this is suboptimal, since the branch could be taken directly by comparing the CR0 value, it generates code correctly on both test and branch and just return value. A possible future optimization could be elimitate the MFCR instruction to branch directly. The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on powerpc64 and powerpc64le. This is send along a clang patch to enabled the builtins and option switch. [1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html Phabricator Review: http://reviews.llvm.org/D8247 llvm-svn: 233204
*	[APInt] Add an isSplat helper and use it in some places.	Benjamin Kramer	2015-03-25	1	-11/+4
\| \| \| \| \| \| \|	To complement getSplat. This is more general than the binary decomposition method as it also handles non-pow2 splat sizes. llvm-svn: 233195
*	Have getCallPreservedMask and getThisCallPreservedMask take a	Eric Christopher	2015-03-11	1	-1/+2
\| \| \| \| \| \| \|	MachineFunction argument so that we can grab subtarget specific features off of it. llvm-svn: 231979
*	Add support for part-word atomics for PPC	Nemanja Ivanovic	2015-03-10	1	-27/+83
\| \| \| \| \| \|	http://reviews.llvm.org/D8090#inline-67337 llvm-svn: 231843
*	Change the generation of the vmuluwm instruction to be based on the MUL opcode.	Kit Barton	2015-03-10	1	-1/+6
\| \| \| \| \| \|	Phabricator review: http://reviews.llvm.org/D8185 llvm-svn: 231827
*	While reviewing the changes to Clang to add builtin support for the vsld, ↵	Kit Barton	2015-03-05	1	-4/+8
\| \| \| \| \| \|	vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics. llvm-svn: 231378
*	Add the following 64-bit vector integer arithmetic instructions added in POWER8:	Kit Barton	2015-03-03	1	-8/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vaddudm vsubudm vmulesw vmulosw vmuleuw vmulouw vmuluwm vmaxsd vmaxud vminsd vminud vcmpequd vcmpequd. vcmpgtsd vcmpgtsd. vcmpgtud vcmpgtud. vrld vsld vsrd vsrad Phabricator review: http://reviews.llvm.org/D7959 llvm-svn: 231115
*	Make some non-constant static variables non-static or fully const.	Benjamin Kramer	2015-03-01	1	-34/+9
\| \| \| \| \| \|	Otherwise we have to emit thread-safe initialization for them. NFC. llvm-svn: 230894
*	[PowerPC] Use vector types for memcpy and friends (sometimes)	Hal Finkel	2015-02-27	1	-2/+25
\| \| \| \| \| \| \| \| \| \|	When using Altivec, we can use vector loads and stores for aligned memcpy and friends. Starting with the P7 and VXS, we have reasonable unaligned vector stores. Starting with the P8, we have fast unaligned loads too. For QPX, we use vector loads are stores, but only for aligned memory accesses. llvm-svn: 230788
*	getRegForInlineAsmConstraint wants to use TargetRegisterInfo for	Eric Christopher	2015-02-26	1	-7/+6
\| \| \| \| \| \| \| \| \|	a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699
*	Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.	Eric Christopher	2015-02-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583
*	[PowerPC] Make LDtocL and friends invariant loads	Hal Finkel	2015-02-25	1	-16/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node, represent loads from the TOC, which is invariant. As a result, these loads can be hoisted out of loops, etc. In order to do this, we need to generate GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory intrinsic node type. Once this is done, the MMO transfer is automatically handled for TableGen-driven instruction selection, and for nodes generated directly in PPCISelDAGToDAG, we need to transfer the MMOs manually. Also, we were not transferring MMOs associated with pre-increment loads, so do that too. Lastly, this fixes an exposed bug where R30 was not added as a defined operand of UpdateGBR. This problem was highlighted by an example (used to generate the test case) posted to llvmdev by Francois Pichet. llvm-svn: 230553
*	[PowerPC] Cleanup unused target-specific SDAG nodes	Hal Finkel	2015-02-25	1	-3/+0
\| \| \| \| \| \| \| \|	We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64 TOC access that were referenced only in TableGen patterns. The associated (pseudo-)instructions are used, but are being generated directly. NFC. llvm-svn: 230518
*	Silencing a "result of 32-bit shift implicitly converted to 64 bits (was ↵	Aaron Ballman	2015-02-25	1	-1/+1
\| \| \| \| \| \|	64-bit shift intended?)" warning in MSVC; NFC. llvm-svn: 230489
*	Silencing a -Wsign-compare warning triggered in MSVC; NFC.	Aaron Ballman	2015-02-25	1	-1/+1
\| \| \| \|	llvm-svn: 230488
*	[PowerPC] Add support for the QPX vector instruction set	Hal Finkel	2015-02-25	1	-48/+1097
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations). I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form). The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work. llvm-svn: 230413
*	CodeGen: convert CCState interface to using ArrayRefs	Tim Northover	2015-02-21	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \|	Everyone except R600 was manually passing the length of a static array at each callsite, calculated in a variety of interesting ways. Far easier to let ArrayRef handle that. There should be no functional change, but out of tree targets may have to tweak their calls as with these examples. llvm-svn: 230118
*	AArch64: Safely handle the incoming sret call argument.	Andrew Trick	2015-02-16	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413
*	PowerPC: Canonicalize access to function attributes, NFC	Duncan P. N. Exon Smith	2015-02-14	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229224
*	Stash the TargetMachine on the subtarget so we can access it later.	Eric Christopher	2015-02-13	1	-1/+1
\| \| \| \| \| \|	Clean up a subtarget function that has it passed in while we're at it. llvm-svn: 229164
*	PPC LinkageSize can be computed at initialization time, do so.	Eric Christopher	2015-02-13	1	-12/+6
\| \| \| \|	llvm-svn: 229163
*	PPCFrameLowering's FramePointerOffset can be computed at initialization	Eric Christopher	2015-02-13	1	-10/+5
\| \| \| \| \| \|	time. Do so. llvm-svn: 228998
*	The TOC save offset can be computed at compile time, do so and	Eric Christopher	2015-02-13	1	-3/+2
\| \| \| \| \| \|	propagate changes. llvm-svn: 228997
*	The return save offset can be computed at initialization time - do	Eric Christopher	2015-02-13	1	-8/+7
\| \| \| \| \| \|	so and save the value. llvm-svn: 228996
*	[PowerPC] Mark jumps as expensive (using using CR bits)	Hal Finkel	2015-02-12	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On PowerPC, which has a full set of logical operations on (its multiple sets of) condition-register bits, it is not profitable to break of complex conditions feeding a jump into multiple jumps. We can turn off this feature of CGP/SDAGBuilder by marking jumps as "expensive". P7 test-suite speedups (no regressions): MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.626647% +/- 0.323583% MultiSource/Benchmarks/Olden/power/power -18.2821% +/- 8.06481% llvm-svn: 228895
*	[PowerPC] Fix reverted patch r227976 to avoid register assignment issues	Bill Schmidt	2015-02-10	1	-59/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See full discussion in http://reviews.llvm.org/D7491. We now hide the add-immediate and call instructions together in a separate pseudo-op, which is tagged to define GPR3 and clobber the call-killed registers. The PPCTLSDynamicCall pass prior to RA now expands this op into the two separate addi and call ops, with explicit definitions of GPR3 on both instructions, and explicit clobbers on the call instruction. The pass is now marked as requiring and preserving the LiveIntervals and SlotIndexes analyses, and fixes these up after the replacement sequences are introduced. Self-hosting has been verified on LE P8 and BE P7 with various optimization levels, etc. It has also been verified with the --no-tls-optimize flag workaround removed. llvm-svn: 228725
*	Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and ↵	Hal Finkel	2015-02-06	1	-12/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	related fixups Unfortunately, even with the workaround of disabling the linker TLS optimizations in Clang restored (which has already been done), this still breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native). Bill is currently working on an alternate implementation to address the TLS issue in a way that also fully elides the linker bug (which, unfortunately, this approach did not fully), so I'm reverting this now. llvm-svn: 228460
*	[PowerPC] Generate pre-increment floating-point ld/st instructions	Hal Finkel	2015-02-05	1	-0/+4
\| \| \| \| \| \| \| \|	PowerPC supports pre-increment floating-point load/store instructions, both r+r and r+i, and we had patterns for them, but they were not marked as legal. Mark them as legal (and add a test case). llvm-svn: 228327
*	[PowerPC] Implement the vclz instructions for PWR8	Bill Schmidt	2015-02-05	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch by Kit Barton. Add the vector count leading zeros instruction for byte, halfword, word, and doubleword sizes. This is a fairly straightforward addition after the changes made for vpopcnt: 1. Add the correct definitions for the various instructions in PPCInstrAltivec.td 2. Make the CTLZ operation legal on vector types when using P8Altivec in PPCISelLowering.cpp Test Plan Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the instructions are being generated when the CTLZ operation is used in LLVM. Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively. llvm-svn: 228301
*	[PowerPC] Implement the vpopcnt instructions for POWER8	Bill Schmidt	2015-02-03	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch by Kit Barton. Add the vector population count instructions for byte, halfword, word, and doubleword sizes. There are two major changes here: PPCISelLowering.cpp: Make CTPOP legal for vector types. PPCRegisterInfo.td: Added v2i64 to the VRRC register definition. This is needed for the doubleword variations of the integer ops that were added in P8. Test Plan Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s Test the generation of the vpopcnt instructions for various vector data types. When adding the v2i64 type to the Vector Register set, I also needed to add the appropriate bit conversion patterns between v2i64 and the existing vector types. Testing for these conversions were also added in the test case by passing a different vector type as a parameter into the test functions. There is also a run step that will ensure the vpopcnt instructions are generated when the vsx feature is disabled. llvm-svn: 228046
*	[PowerPC] Yet another approach to __tls_get_addr	Bill Schmidt	2015-02-03	1	-57/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is a third attempt to properly handle the local-dynamic and global-dynamic TLS models. In my original implementation, calls to __tls_get_addr were hidden from view until the asm-printer phase, at which point the underlying branch-and-link instruction was created with proper relocations. This mostly worked well, but I used some repellent techniques to ensure that the TLS_GET_ADDR nodes at the SD and MI levels correctly received input from GPR3 and produced output into GPR3. This proved to work badly in the presence of multiple TLS variable accesses, with the copies to and from GPR3 being scheduled incorrectly and generally creating havoc. In r221703, I addressed that problem by representing the calls to __tls_get_addr as true calls during instruction lowering. This had the advantage of removing all of the bad hacks and relying on the existing call machinery to properly glue the copies in place. It looked like this was going to be the right way to go. However, as a side effect of the recent discovery of problems with linker optimizations for TLS, we discovered cases of suboptimal code generation with this strategy. The problem comes when tls_get_addr is called for the same address, and there is a resulting CSE opportunity. It turns out that in such cases MachineCSE will common the addis/addi instructions that set up the input value to tls_get_addr, but will not common the calls themselves. MachineCSE does not have any machinery to common idempotent calls. This is perfectly sensible, since presumably this would be done at the IR level, and introducing calls in the back end isn't commonplace. In any case, we end up with two calls to __tls_get_addr when one would suffice, and that isn't good. I presumed that the original design would have allowed commoning of the machine-specific nodes that hid the __tls_get_addr calls, so as suggested by Ulrich Weigand, I went back to that design and cleaned it up so that the copies were properly held together by glue nodes. However, it turned out that this didn't work either...the presence of copies to physical registers kept the machine-specific nodes from being commoned also. All of which leads to the design presented here. This is a return to the original design, except that no attempt is made to introduce copies to and from GPR3 during instruction lowering. Virtual registers are used until prior to register allocation. At that point, a special pass is run that identifies the machine-specific nodes that hide the tls_get_addr calls and introduces the copies to and from GPR3 around them. The register allocator then coalesces these copies away. With this design, MachineCSE succeeds in commoning tls_get_addr calls where possible, and we get nice optimal code generation (better than GCC at the moment, which does not common these calls). One additional problem must be dealt with: After introducing the mentions of the physical register GPR3, the aggressive anti-dependence breaker sees opportunities to improve scheduling by selecting a different register instead. Flags must be used on the instruction descriptions to tell the anti-dependence breaker to keep its hands in its pockets. One thing missing from the original design was recording a definition of the link register on the GET_TLS_ADDR nodes. Doing this was found to be insufficient to force a stack frame to be created, which led to looping behavior because two different LR values were stored at the same address. This appears to have been an oversight in PPCFrameLowering::determineFrameLayout(), which is repaired here. Because MustSaveLR() returns true for calls to builtin_return_address, this changed the expected behavior of test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but formerly did not. I've fixed the test case to reflect this. There are existing TLS tests to catch regressions; the checks in test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the face of instruction scheduling with these changes, so I fixed that up. I've added a new test case based on the PrettyStackTrace module that demonstrated the original problem. This checks that we get correct code generation and that CSE of the calls to __get_tls_addr has taken place. llvm-svn: 227976
*	[PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions	Hal Finkel	2015-02-01	1	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TOC base pointer is passed in r2, and we normally reserve this register so that we can depend on it being there. However, for leaf functions, and specifically those leaf functions that don't do any TOC access of their own (which is generally due to accessing the constant pool, using TLS, etc.), we can treat r2 as an ordinary callee-saved register (it must be callee-saved because, for local direct calls, the linker will not insert any save/restore code). The allocation order has been changed slightly for PPC64/ELF systems to put r2 at the end of the list (while leaving it near the beginning for Darwin systems to prevent unnecessary output changes). While r2 is allocatable, using it still requires spill/restore traffic, and thus comes at the end of the list. llvm-svn: 227745
*	Use the cached subtargets and remove calls to getSubtarget/getSubtargetImpl	Eric Christopher	2015-01-30	1	-131/+120
\| \| \| \| \| \|	without a Function argument. llvm-svn: 227622
*	Move DataLayout back to the TargetMachine from TargetSubtargetInfo	Eric Christopher	2015-01-26	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets() have had subtarget dependent code moved out and onto the TargetMachine. One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
*	[PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2	Hal Finkel	2015-01-19	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \|	Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call instructions in order to represent the dependency on the TOC base pointer value. Restricting this to ELF V2, however, does not seem to make sense: calls under ELF V1 have the same dependence, and indirect calls have an r2 dependence just as direct ones. Make sure the dependence is noted for all calls under both ELF V1 and ELF V2. llvm-svn: 226432