bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Pass a MCSymbol to needsRelocateWithSymbol.	Rafael Espindola	2015-05-29	1	-3/+3
\| \| \| \|	llvm-svn: 238589
*	Add support for VSX FMA single-precision instructions to the PPC back end	Nemanja Ivanovic	2015-05-29	2	-9/+94
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D9941 It adds the various FMA instructions introduced in the version 2.07 of the ISA along with the testing for them. These are operations on single precision scalar values in VSX registers. llvm-svn: 238578
*	Remove a trivial forwarding function. NFC.	Rafael Espindola	2015-05-28	2	-3/+3
\| \| \| \|	llvm-svn: 238506
*	Use operator<< instead of print in a few more places.	Rafael Espindola	2015-05-27	1	-2/+2
\| \| \| \|	llvm-svn: 238315
*	Use std::bitset for SubtargetFeatures.	Michael Kuperstein	2015-05-26	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. The first several times this was committed (e.g. r229831, r233055), it caused several buildbot failures. Apparently the reason for most failures was both clang and gcc's inability to deal with large numbers (> 10K) of bitset constructor calls in tablegen-generated initializers of instruction info tables. This should now be fixed. llvm-svn: 238192
*	Stop using MCSectionData in MCMachObjectWriter.h.	Rafael Espindola	2015-05-26	1	-5/+3
\| \| \| \|	llvm-svn: 238165
*	Stop using MCSectionData in MCExpr.h.	Rafael Espindola	2015-05-26	1	-8/+5
\| \| \| \|	llvm-svn: 238163
*	Return a MCSection from MCFragment::getParent().	Rafael Espindola	2015-05-26	1	-7/+11
\| \| \| \| \| \|	Another step in merging MCSectionData and MCSection. llvm-svn: 238162
*	This patch adds support for the vector quadword add/sub instructions introduced	Kit Barton	2015-05-25	2	-9/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in POWER8: vadduqm vaddeuqm vaddcuq vaddecuq vsubuqm vsubeuqm vsubcuq vsubecuq In addition to adding the instructions themselves, it also adds support for the v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and IntrinsicEmitter.cpp). http://reviews.llvm.org/D9081 llvm-svn: 238144
*	Stop forwarding getOrdinal and setOrdinal.	Rafael Espindola	2015-05-25	1	-2/+3
\| \| \| \|	llvm-svn: 238139
*	[PowerPC] Fix fast-isel when compare is split from branch	Hal Finkel	2015-05-23	1	-19/+32
\| \| \| \| \| \| \| \| \| \| \|	When the compare feeding a branch was in a different BB from the branch, we'd try to "regenerate" the compare in the block with the branch, possibly trying to make use of values not available there. Copy a page from AArch64's play book here to fix the problem (at least in terms of correctness). Fixes PR23640. llvm-svn: 238097
*	[PPC64] Add support for clrbhrb, mfbhrbe, rfebb.	Bill Schmidt	2015-05-22	7	-0/+82
\| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the ISA 2.07 additions involving the branch history rolling buffer and event-based branching. These will not be used by typical applications, so built-in support is not required. They will only be available via inline assembly. Assembly/disassembly tests are included in the patch. llvm-svn: 238032
*	[PPC] Correct iterator bug in PPCTLSDynamicCall	Hal Finkel	2015-05-21	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately, I can't reduce a small test case for this (although compiling mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but the problem is fairly clear (at least once you know you're looking for one). If the TLS instruction being replaced was at the end of the block, we'd increment the iterator past it (so it would then point to MBB.end()), and then we'd increment it again as part of the for statement, thus overrunning the end of the list. Don't do that. llvm-svn: 237974
*	[PPC64] Handle vpkudum mask pattern correctly when vpkudum isn't available	Bill Schmidt	2015-05-21	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My recent patch to add support for ISA 2.07 vector pack/unpack instructions didn't properly check for availability of the vpkudum instruction when recognizing it as a special vector shuffle case. This causes us to leave the vector shuffle in place (rather than converting it to a vector permute) so that it can be recognized later as a vpkudum, but that pattern is invalid for processors prior to POWER8. Thus LLVM crashes with an "unable to select" message. We observed this since one of our buildbots is configured to generate code for a POWER7. This patch fixes the problem by checking for availability of the vpkudum instruction during custom lowering of vector shuffles. I've added a test case variant for the vpkudum pattern when the instruction isn't available. llvm-svn: 237952
*	[PPC/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the ↵	Hal Finkel	2015-05-21	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	PPC/A2 On X86 (and similar OOO cores) unrolling is very limited, and even if the runtime unrolling is otherwise profitable, the expense of a division to compute the trip count could greatly outweigh the benefits. On the A2, we unroll a lot, and the benefits of unrolling are more significant (seeing a 5x or 6x speedup is not uncommon), so we're more able to tolerate the expense, on average, of a division to compute the trip count. llvm-svn: 237947
*	Add support for VSX scalar single-precision arithmetic in the PPC target	Nemanja Ivanovic	2015-05-21	1	-46/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	http://reviews.llvm.org/D9891 Following up on the VSX single precision loads and stores added earlier, this adds support for elementary arithmetic operations on single precision values in VSX registers. These instructions utilize the new VSSRC register class. Instructions added: xsaddsp xsdivsp xsmulsp xsresp xsrsqrtesp xssqrtsp xssubsp llvm-svn: 237937
*	Move alignment from MCSectionData to MCSection.	Rafael Espindola	2015-05-21	4	-21/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This starts merging MCSection and MCSectionData. There are a few issues with the current split between MCSection and MCSectionData. * It optimizes the the not as important case. We want the production of .o files to be really fast, but the split puts the information used for .o emission in a separate data structure. * The ELF/COFF/MachO hierarchy is not represented in MCSectionData, leading to some ad-hoc ways to represent the various flags. * It makes it harder to remember where each item is. The attached patch starts merging the two by moving the alignment from MCSectionData to MCSection. Most of the patch is actually just dropping 'const', since MCSectionData is mutable, but MCSection was not. llvm-svn: 237936
*	MC: Use MCSymbol in MachObjectWriter, NFC	Duncan P. N. Exon Smith	2015-05-20	1	-10/+9
\| \| \| \| \| \| \|	Replace uses of `MCSymbolData` with `MCSymbol` where both are needed, so we can remove the backpointer. llvm-svn: 237799
*	MC: Take MCSymbol in MachObjectWriter::getSymbolAddress(), NFC	Duncan P. N. Exon Smith	2015-05-20	1	-2/+2
\| \| \| \| \| \| \|	Pass through an `MCSymbol` instead of an `MCSymbolData` so we can get rid of the back pointer. llvm-svn: 237750
*	MC: Use MCSymbol in MCAsmLayout::getSymbolOffset(), NFC	Duncan P. N. Exon Smith	2015-05-19	1	-1/+1
\| \| \| \| \| \| \|	Continue to canonicalize on MCSymbol instead of MCSymbolData when both are needed. llvm-svn: 237749
*	Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced ↵	David Blaikie	2015-05-18	3	-10/+10
\| \| \| \| \| \|	init only llvm-svn: 237624
*	MC: Clean up method names in MCContext.	Jim Grosbach	2015-05-18	5	-23/+23
\| \| \| \| \| \| \|	The naming was a mish-mash of old and new style. Update to be consistent with the new. NFC. llvm-svn: 237594
*	[PowerPC] Add extra r2 read deps on @toc@l relocations	Hal Finkel	2015-05-18	4	-0/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If some commits are happy, and some commits are sad, this is a sad commit. It is sad because it restricts instruction scheduling to work around a binutils linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was updated not to produce code triggering this bug, and now we'll do the same... When resolving an address using the ELF ABI TOC pointer, two relocations are generally required: one for the high part and one for the low part. Only the high part generally explicitly depends on r2 (the TOC pointer). And, so, we might produce code like this: .Ltmp526: addis 3, 2, .LC12@toc@ha .Ltmp1628: std 2, 40(1) ld 5, 0(27) ld 2, 8(27) ld 11, 16(27) ld 3, .LC12@toc@l(3) rldicl 4, 4, 0, 32 mtctr 5 bctrl ld 2, 40(1) And there is nothing wrong with this code, as such, but there is a linker bug in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will misoptimize this code sequence to this: nop std r2,40(r1) ld r5,0(r27) ld r2,8(r27) ld r11,16(r27) ld r3,-32472(r2) clrldi r4,r4,32 mtctr r5 bctrl ld r2,40(r1) because the linker does not know (and does not check) that the value in r2 changed in between the instruction using the .LC12@toc@ha (TOC-relative) relocation and the instruction using the .LC12@toc@l(3) relocation. Because it finds these instructions using the relocations (and not by scanning the instructions), it has been asserted that there is no good way to detect the change of r2 in between. As a result, this bug may never be fixed (i.e. it may become part of the definition of the ABI). GCC was updated to add extra dependencies on r2 to instructions using the @toc@l relocations to avoid this problem, and we'll do the same here. This is done as a separate pass because: 1. These extra r2 dependencies are not really properties of the instructions, but rather due to a linker bug, and maybe one day we'll be able to get rid of them when targeting linkers without this bug (and, thus, keeping the logic centralized here will make that straightforward). 2. There are ISel-level peephole optimizations that propagate the @toc@l relocations to some user instructions, and so the exta dependencies do not apply only to a fixed set of instructions (without undesirable definition replication). The test case was reduced with the help of bugpoint, with minimal cleaning. I'm looking forward to our upcoming MI serialization support, and with that, much better tests can be created. llvm-svn: 237556
*	MC: Use MCSymbol in RelAndSymbol, NFC	Duncan P. N. Exon Smith	2015-05-16	1	-2/+2
\| \| \| \| \| \|	Switch from `MCSymbolData` to `MCSymbol`. llvm-svn: 237502
*	[PPC64] Add vector pack/unpack support from ISA 2.07	Bill Schmidt	2015-05-16	4	-2/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the following new instructions in the Power ISA 2.07: vpksdss vpksdus vpkudus vpkudum vupkhsw vupklsw These instructions are available through the vec_packs, vec_packsu, vec_unpackh, and vec_unpackl built-in interfaces. These are lane-sensitive instructions, so the built-ins have different implementations for big- and little-endian, and the instructions must be marked as killing the vector swap optimization for now. The first three instructions perform saturating pack operations. The fourth performs a modulo pack operation, which means it can be represented with a vector shuffle, and conversely the appropriate vector shuffles may cause this instruction to be generated. The other instructions are only generated via built-in support for now. Appropriate tests have been added. There is a companion patch to clang for the rest of this support. llvm-svn: 237499
*	Remove 3 includes from MCInstrDesc.h and explicitly include them where needed	Pete Cooper	2015-05-15	3	-0/+4
\| \| \| \|	llvm-svn: 237481
*	MC: MCCodeGenInfo naming update. NFC.	Jim Grosbach	2015-05-15	1	-1/+1
\| \| \| \| \| \|	s/InitMCCodeGenInfo/initMCCodeGenInfo/ llvm-svn: 237471
*	MC: Update MCCodeEmitter naming. NFC.	Jim Grosbach	2015-05-15	1	-1/+1
\| \| \| \| \| \|	s/EncodeInstruction/encodeInstruction/ llvm-svn: 237469
*	MC: Update MCFixup naming. NFC.	Jim Grosbach	2015-05-15	1	-9/+9
\| \| \| \| \| \|	s/MCFixup::Create/MCFixup::create/ llvm-svn: 237468
*	MC: Modernize MCOperand API naming. NFC.	Jim Grosbach	2015-05-13	4	-105/+105
\| \| \| \| \| \|	MCOperand::Create() methods renamed to MCOperand::create(). llvm-svn: 237275
*	Reverting r237234, "Use std::bitset for SubtargetFeatures"	Michael Kuperstein	2015-05-13	3	-3/+3
\| \| \| \| \| \| \|	The buildbots are still not satisfied. MIPS and ARM are failing (even though at least MIPS was expected to pass). llvm-svn: 237245
*	Use std::bitset for SubtargetFeatures	Michael Kuperstein	2015-05-13	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. The first two times this was committed (r229831, r233055), it caused several buildbot failures. At least some of the ARM and MIPS ones were due to gcc/binutils issues, and should now be fixed. llvm-svn: 237234
*	Strip trailing whitespace. NFC	Douglas Katzman	2015-05-12	1	-1/+1
\| \| \| \|	llvm-svn: 237165
*	Fix compile error	Arnold Schwaighofer	2015-05-09	1	-1/+1
\| \| \| \|	llvm-svn: 236921
*	ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not ↵	Arnold Schwaighofer	2015-05-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916
*	Change getTargetNodeName() to produce compiler warnings for missing cases, ↵	Matthias Braun	2015-05-07	2	-3/+11
\| \| \| \| \| \|	fix them llvm-svn: 236775
*	Add VSX Scalar loads and stores to the PPC back end	Nemanja Ivanovic	2015-05-07	8	-8/+150
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D9440 It adds a new register class to the PPC back end to contain single precision values in VSX registers. Additionally, it adds scalar loads and stores for VSX registers. llvm-svn: 236755
*	[X86] Disable loop unrolling in loop vectorization pass when VF is 1.	Wei Mi	2015-05-06	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613
*	[PPC64LE] Adjust vector splats during VSX swap optimization	Bill Schmidt	2015-05-06	1	-7/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The initial code drop for VSX swap optimization permitted the optimization only when all operations in a web of related computation are lane-insensitive. For some lane-sensitive operations, we can still permit the optimization provided that we make adjustments to those operations. This patch adds special handling for vector splats so that their presence doesn't kill the optimization. Vector splats are lane-sensitive since they identify by number a vector element to be used as the source of a splat. When swap optimizations take place, the desired vector element will move to the opposite doubleword of the quadword vector. We thus replace the index I by (I + N/2) % N, where N is the number of elements in the vector. A new test case is added to test that swap optimization succeeds when vector splats are present, and that the proper input element is used as the source of the splat. An ancillary change removes SH_BUILDVEC as one of the kinds of special handling that may be required by VSX swap optimization. From experience with GCC, I had expected to need some modifications for vector build operations, but I did not find that to be the case. llvm-svn: 236606
*	[ShrinkWrap] Add (a simplified version) of shrink-wrapping.	Quentin Colombet	2015-05-05	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507
*	This patch adds ABI support for v1i128 data type.	Kit Barton	2015-05-05	5	-13/+46
\| \| \| \| \| \| \| \| \| \| \| \|	It adds v1i128 to the appropriate register classes and checks parameter passing and return values. This is related to http://reviews.llvm.org/D9081, which will add instructions that exploit the v1i128 datatype. Phabricator review: http://reviews.llvm.org/D9475 llvm-svn: 236503
*	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"	Sergey Dmitrouk	2015-04-28	6	-270/+304
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
*	Revert "[DebugInfo] Add debug locations to constant SD nodes"	Daniel Jasper	2015-04-28	6	-304/+270
\| \| \| \| \| \| \|	This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
*	[DebugInfo] Add debug locations to constant SD nodes	Sergey Dmitrouk	2015-04-28	6	-270/+304
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
*	Silence unused variable errors for no-asserts builds	Bill Schmidt	2015-04-27	1	-0/+4
\| \| \| \|	llvm-svn: 235913
*	[PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations	Bill Schmidt	2015-04-27	6	-0/+824
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new SSA MI pass that runs on little-endian PPC64 code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors without alignment constraints are accomplished for little-endian using lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional xxswapd instructions hurts performance in comparison with big-endian code, but they are necessary in the general case to support correct semantics. However, the general case does not apply to most vector code. Many vector instructions are lane-insensitive; they do not "care" which lanes the parallel computations are performed within, provided that the resulting data is stored into the correct locations. Thus this pass looks for computations that perform only lane-insensitive operations, and remove the unnecessary swaps from loads and stores in such computations. Future improvements will allow computations using certain lane-sensitive operations to also be optimized in this manner, by modifying the lane-sensitive operations to account for the permuted order of the lanes. However, this patch only adds the infrastructure to permit this; no lane-sensitive operations are optimized at this time. This code is heavily exercised by the various vectorizing applications in the projects/test-suite tree. For the time being, I have only added one simple test case to demonstrate what the pass is doing. Although it is quite simple, it provides coverage for much of the code, including the special case handling of copies and subreg-to-reg operations feeding the swaps. I plan to add additional tests in the future as I fill in more of the "special handling" code. Two existing tests were affected, because they expected the swaps to be present, but they are now removed. llvm-svn: 235910
*	[AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.	Lang Hames	2015-04-24	1	-150/+150
\| \| \| \| \| \| \|	AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a reference for this is crufty. llvm-svn: 235752
*	[PowerPC] Support register name prefixes for vector registers	Hal Finkel	2015-04-23	1	-0/+8
\| \| \| \| \| \| \|	Match binutils by supporting the optional register name prefix for new vector registers ("vs" for VSX registers and "q" for QPX registers). llvm-svn: 235665
*	[PowerPC] Use sync inst alias when printing	Hal Finkel	2015-04-23	1	-1/+1
\| \| \| \| \| \| \|	So long as the choice between printing msync and sync is not ambiguous, we can print 'sync 0' and just 'sync'. llvm-svn: 235663
*	[PowerPC] Add asm/disasm support for dcbt with hint	Hal Finkel	2015-04-23	4	-8/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint field specified (non-zero). Unforunately, the syntax for this instruction is special in that it differs for server vs. embedded cores: dcbt ra, rb, th [server] dcbt th, ra, rb [embedded] where th can be omitted when it is 0. dcbtst is the same. Thus we need to play games in the parser and the printer to flip the operands around on the embedded cores. We'll use the server syntax as the default (binutils currently uses the embedded form by default, but IBM is changing that). We also stop marking dcbtst as having unmodeled side effects (this is not necessary, it is just a hint like dcbt -- noticed by inspection, so no separate test case). llvm-svn: 235657