bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PowerPC] Implement the areMemAccessesTriviallyDisjoint hook	QingShan Zhang	2019-07-02	1	-10/+10
\| \| \| \| \| \| \| \| \|	After implemented this hook, we will model the memory dependency in the scheduling dependency graph more precise, and will have more opportunity to reorder the load/stores, as they didn't have the dependency at some condition Differential Revision: https://reviews.llvm.org/D63804 llvm-svn: 364886
*	RegAllocFast: Improve hinting heuristic	Matt Arsenault	2019-05-16	1	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trace through multiple COPYs when looking for a physreg source. Add hinting for vregs that will be copied into physregs (we only hinted for vregs getting copied to a physreg previously). Give hinted a register a bonus when deciding which value to spill. This is part of my rewrite regallocfast series. In fact this one doesn't even have an effect unless you also flip the allocation to happen from back to front of a basic block. Nonetheless it helps to split this up to ease review of D52010 Patch by Matthias Braun llvm-svn: 360887
*	Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out ↵	Matt Arsenault	2019-05-03	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \|	of a block" This reverts commit r359912. This should pass now, since the clang test was made less fragile in r359918. llvm-svn: 359919
*	Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out ↵	Nico Weber	2019-05-03	1	-0/+12
\| \| \| \| \| \| \| \|	of a block" Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail. llvm-svn: 359912
*	RegAllocFast: Add heuristic to detect values not live-out of a block	Matt Arsenault	2019-05-03	1	-12/+0
\| \| \| \| \| \| \| \| \|	Add an improved/new heuristic to catch more cases when values are not live out of a basic block. Patch by Matthias Braun llvm-svn: 359906
*	RegAllocFast: Remove early selection loop, the spill calculation will report ↵	Matt Arsenault	2019-03-19	1	-633/+1971
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cost 0 anyway for free regs The 2nd loop calculates spill costs but reports free registers as cost 0 anyway, so there is little benefit from having a separate early loop. Surprisingly this is not NFC, as many register are marked regDisabled so the first loop often picks up later registers unnecessarily instead of the first one available in the allocation order... Patch by Matthias Braun llvm-svn: 356499
*	[PowerPC] Complete the custom legalization of vector int to fp conversion	Nemanja Ivanovic	2018-12-29	1	-22/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155
*	[Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics	Zaara Syeda	2018-11-05	1	-48/+0
\| \| \| \| \| \| \| \| \| \| \| \|	On Power9, we don't have patterns to select the following intrinsics: llvm.ppc.vsx.stxvw4x.be llvm.ppc.vsx.stxvd2x.be This patch adds support for these. Differential Revision: https://reviews.llvm.org/D53581 llvm-svn: 346148
*	[PowerPC] Revert commit r339779	Nemanja Ivanovic	2018-08-27	1	-2/+2
\| \| \| \| \| \| \|	This commit has caused failures in some internal benchmarks. Temporarily reverting this patch until the issue can be diagnosed and fixed. llvm-svn: 340740
*	[PowerPC] Change Test Options [NFC]	Stefan Pintilie	2018-08-24	1	-299/+309
\| \| \| \| \| \|	Patch by amyk llvm-svn: 340639
*	[PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler	Stefan Pintilie	2018-07-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	We want to run the Machine Scheduler instead of the List Scheduler after RA. Checked with a performance run on a Power 9 machine with SPEC 2006 and while some benchmarks improved and others degraded the geomean was slightly improved with the Machine Scheduler. Differential Revision: https://reviews.llvm.org/D45265 llvm-svn: 336295
*	[PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex ↵	Lei Huang	2017-10-11	1	-10/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	elimination to only use `lis/addi` if necessary. Currently we produce a bunch of unnecessary code when emitting the prologue/epilogue for spills/restores. Namely, if the load from stack slot/store to stack slot instruction is an X-Form instruction, we will always produce an LIS/ORI sequence for the stack offset. Furthermore, we have not exploited the P9 vector D-Form loads/stores for this purpose. This patch address both issues. Specifying the D-Form load as the instruction to use for stack spills/reloads should be safe because: 1. The stack should be aligned according to the ABI 2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will check for the offset being a multiple of 16 and will convert it to an X-Form instruction if it isn't. Differential Revision : https://reviews.llvm.org/D38758 llvm-svn: 315500
*	RegisterScavenging: Followup to r305625	Matthias Braun	2017-06-20	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	This does some improvements/cleanup to the recently introduced scavengeRegisterBackwards() functionality: - Rewrite findSurvivorBackwards algorithm to use the existing LiveRegUnit::accumulateBackward() code. This also avoids the Available and Candidates bitset and just need 1 LiveRegUnit instance (= 1 bitset). - Pick registers in allocation order instead of register number order. llvm-svn: 305817
*	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE	Nemanja Ivanovic	2017-05-02	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes PR30730. This is a re-commit of a pulled commit. The commit was pulled because some software projects contained uses of Altivec vectors that violated alignment requirements. Known issues have now been fixed. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D26861 llvm-svn: 301892
*	Use PIC relocation model as default for PowerPC64 ELF.	Joerg Sonnenberger	2016-12-15	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the PowerPC64 code generation for the ELF ABI is already PIC. There are four main exceptions: (1) Constant pointer arrays etc. should in writeable sections. (2) The TOC restoration NOP after a call is needed for all global symbols. While GNU ld has a workaround for questionable GCC self-calls, we trigger the checks for calls from COMDAT sections as they cross input sections and are therefore not considered self-calls. The current decision is questionable and suboptimal, but outside the scope of the change. (3) TLS access can not use the initial-exec model. (4) Jump tables should use relative addresses. Note that the current encoding doesn't work for the large code model, but it is more compact than the default for any non-trivial jump table. Improving this is again beyond the scope of this change. At least (1) and (3) are assumptions made in target-independent code and introducing additional hooks is a bit messy. Testing with clang shows that a -fPIC binary is 600KB smaller than the corresponding -fno-pic build. Separate testing from improved jump table encodings would explain only about 100KB or so. The rest is expected to be a result of more aggressive immediate forming for -fno-pic, where the -fPIC binary just uses TOC entries. This change brings the LLVM output in line with the GCC output, other PPC64 compilers like XLC on AIX are known to produce PIC by default as well. The relocation model can still be provided explicitly, i.e. when using MCJIT. One test case for case (1) is included, other test cases with relocation mode sensitive behavior are wired to static for now. They will be reviewed and adjusted separately. Differential Revision: https://reviews.llvm.org/D26566 llvm-svn: 289743
*	Revert https://reviews.llvm.org/rL287679	Nemanja Ivanovic	2016-11-29	1	-24/+15
\| \| \| \| \| \| \|	This commit caused some miscompiles that did not show up on any of the bots. Reverting until we can investigate the cause of those failures. llvm-svn: 288214
*	[PowerPC] Improvements for BUILD_VECTOR Vol. 1	Nemanja Ivanovic	2016-11-29	1	-6/+2
\| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D25912 This is the first patch in a series of 4 that improve the lowering and combining for BUILD_VECTOR nodes on PowerPC. llvm-svn: 288152
*	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE	Nemanja Ivanovic	2016-11-22	1	-15/+24
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D26861 It also fixes PR30730. Committing on behalf of Lei Huang. llvm-svn: 287679
*	[PowerPC] Implement BE VSX load/store builtins - llvm portion.	Tony Jiang	2016-11-15	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \|	This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE, they behaves exactly the same with vec_xl and vec_xst, therefore they are simply implemented by defining a matching macro. On LE, they are implemented by defining new builtins and intrinsics. For int/float/long long/double, it is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short, we also need some extra shuffling before or after call the builtins to get the desired BE order. For int128, simply call vec_xl or vec_xst. llvm-svn: 286967
*	[Power9] Part-word VSX integer scalar loads/stores and sign extend instructions	Nemanja Ivanovic	2016-10-04	1	-133/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D23155 This patch removes the VSHRC register class (based on D20310) and adds exploitation of the Power9 sub-word integer loads into VSX registers as well as vector sign extensions. The new instructions are useful for a few purposes: Int to Fp conversions of 1 or 2-byte values loaded from memory Building vectors of 1 or 2-byte integers with values loaded from memory Storing individual 1 or 2-byte elements from integer vectors This patch implements all of those uses. llvm-svn: 283190
*	[Power9] Builtins for ELF v.2 API conformance - back end portion	Nemanja Ivanovic	2016-09-27	1	-19/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D24396 This patch adds support for the "vector count trailing zeroes", "vector compare not equal" and "vector compare not equal or zero instructions" as well as "scalar count trailing zeroes" instructions. It also changes the vector negation to use XXLNOR (when VSX is enabled) so as not to increase register pressure (previously this was done with a splat immediate of all ones followed by an XXLXOR). This was done because the altivec.h builtins (patch to follow) use vector negation and the use of an additional register for the splat immediate is not optimal. llvm-svn: 282478
*	Adding -verify-machineinstrs option to PowerPC tests	Ehsan Amiri	2016-08-03	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	Currently we have a number of tests that fail with -verify-machineinstrs. To detect this cases earlier we add the option to the testcases with the exception of tests that will currently fail with this option. PR 27456 keeps track of this failures. No code review, as discussed with Hal Finkel. llvm-svn: 277624
*	[PowerPC] - Legalize vector types by widening instead of integer promotion	Nemanja Ivanovic	2016-07-05	1	-41/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D20443 It changes the legalization strategy for illegal vector types from integer promotion to widening. This only applies for vectors with elements of width that is a multiple of a byte since we have hardware support for vectors with 1, 2, 3, 8 and 16 byte elements. Integer promotion for vectors is quite expensive on PPC due to the sequence of breaking apart the vector, extending the elements and reconstituting the vector. Two of these operations are expensive. This patch causes between minor and major improvements in performance on most benchmarks. There are very few benchmarks whose performance regresses. These regressions can be handled in a subsequent patch with a DAG combine (similar to how this patch handles int -> fp conversions of illegal vector types). llvm-svn: 274535
*	[PowerPC] Add an MI SSA peephole pass.	Bill Schmidt	2015-11-10	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a pass for doing PowerPC peephole optimizations at the MI level while the code is still in SSA form. This allows for easy modifications to the instructions while depending on a subsequent pass of DCE. Both passes are very fast due to the characteristics of SSA. At this time, the only peepholes added are for cleaning up various redundancies involving the XXPERMDI instruction. However, I would expect this will be a useful place to add more peepholes for inefficiencies generated during instruction selection. The pass is placed after VSX swap optimization, as it is best to let that pass remove unnecessary swaps before performing any remaining clean-ups. The utility of these clean-ups are demonstrated by changes to four existing test cases, all of which now have tighter expected code generation. I've also added Eric Schweiz's bugpoint-reduced test from PR25157, for which we now generate tight code. One other test started failing for me, and I've fixed it (test/Transforms/PlaceSafepoints/finite-loops.ll) as well; this is not related to my changes, and I'm not sure why it works before and not after. The problem is that the CHECK-NOT: of "statepoint" from test1 fails because of the "statepoint" in test2, and so forth. Adding a CHECK-LABEL in between keeps the different occurrences of that string properly scoped. llvm-svn: 252651
*	Fix for bootstrap bug introduced in r244921	Nemanja Ivanovic	2015-11-02	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. It turns out that the new code path taken due to legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a micro optimization to change a load followed by a scalar_to_vector into a load and splat instruction on PPC. llvm-svn: 251798
*	Temporary fix for the self-host failures introduced by rL244921.	Nemanja Ivanovic	2015-08-19	1	-8/+8
\| \| \| \| \| \| \| \| \|	This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. I am working on resolving the issue, but in the meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64 and the associated testing until I can get this fixed. llvm-svn: 245481
*	Scalar to vector conversions using direct moves	Nemanja Ivanovic	2015-08-13	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D11471 It improves the code generated for converting a scalar to a vector value. With direct moves from GPRs to VSRs, we no longer require expensive stack operations for this. Subsequent patches will handle the reverse case and more general operations between vectors and their scalar elements. llvm-svn: 244921
*	[PowerPC] v4i32 is a VSRCRegClass	Bill Schmidt	2015-07-16	1	-35/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was looking at some vector code generation and kept seeing unnecessary vector copies into the Altivec half of the VSX registers. I discovered that we overlooked v4i32 when adding the register classes for VSX; we only added v4f32 and v2f64. This means that anything that canonicalizes into v4i32 (which is a LOT of stuff) ends up being forced into VRRC on its way to VSRC. The fix is one line. The rest of the patch is fixing up some test cases whose code generation has changed as a result. This seems like it would be a good candidate for backport to 3.7. llvm-svn: 242442
*	[PPC64LE] Enable missing lxvdsx optimization, and related swap optimization	Bill Schmidt	2015-07-01	1	-6/+288
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When adding little-endian vector support for PowerPC last year, I inadvertently disabled an optimization that recognizes a load-splat idiom and generates the lxvdsx instruction. This patch moves the offending logic so lxvdsx is once again generated. This pattern is frequently generated by the vectorizer for scalar loads of an effective constant. Previously the lxvdsx instruction was wrongly listed as lane-sensitive for the VSX swap optimization (since both doublewords are identical, swaps are safe). This patch fixes this as well, so that vectorized code using lxvdsx can now have swaps removed from the computation. There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll that checks for the missing optimization. However, vsx.ll was only being tested for POWER7 with big-endian code generation. I've added a little-endian RUN statement and expected LE code generation for all the tests in vsx.ll to give us a bit better VSX coverage, including what's needed for this patch. llvm-svn: 241183
*	[PowerPC] Enable printing instructions using aliases	Hal Finkel	2015-04-23	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	TableGen had been nicely generating code to print a number of instructions using shorter aliases (and PowerPC has plenty of short mnemonics), but we were not calling it. For some of the aliases we support in the parser, TableGen can't infer the "inverse" alias relationship, so there is still more to do. Thus, after some hours of updating test cases... llvm-svn: 235616
*	[opaque pointer type] Add textual IR support for explicit type parameter to ↵	David Blaikie	2015-02-27	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
*	[PowerPC]Update Power VSX test cases to also test fast-isel	Bill Seurer	2014-12-05	1	-117/+384
\| \| \| \| \| \| \| \|	Update of some of the VSX test cases for Power to check fast-isel codegen as well as the regular codegen. http://reviews.llvm.org/D6357 llvm-svn: 223509
*	[PATCH] Support select-cc for VSFRC when VSX is enabled	Bill Schmidt	2014-10-22	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to handle <2 x double> cases. This patch adds SELECT_VSFRC and SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the f64 type when VSX is enabled. The changes are analogous to those in the previous patch. I've added a new variant to vsx.ll to test the code generation. (I also cleaned up a little formatting in PPCInstrVSX.td from the previous patch.) llvm-svn: 220395
*	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation	Bill Schmidt	2014-10-17	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047
*	[PowerPC] Fix FrameIndex handling in SelectAddressRegImm	Ulrich Weigand	2014-07-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482
*	[PowerPC] Add some missing VSX bitcast patterns	Hal Finkel	2014-04-01	1	-0/+8
\| \| \| \|	llvm-svn: 205352
*	[PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles	Hal Finkel	2014-03-31	1	-5/+6
\| \| \| \| \| \| \|	If we have two unique values for a v2i64 build vector, this will always result in two vector loads if we expand using shuffles. Only one is necessary. llvm-svn: 205231
*	Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELT	Hal Finkel	2014-03-31	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the loop vectorizer vectorizes code that uses the loop induction variable, we often end up with IR like this: %b1 = insertelement <2 x i32> undef, i32 %v, i32 0 %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer %i = add <2 x i32> %b2, <i32 2, i32 3> If the add in this example is not legal (as is the case on PPC with VSX), it will be scalarized, and we'll end up with a number of extract_vector_elt nodes with the vector shuffle as the input operand, and that vector shuffle is fed by one or more build_vector nodes. By the time that vector operations are expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by looking through the vector shuffle (to make sure that no illegal operations are created), and so the extract_vector_elt -> vector shuffle -> build_vector is never simplified to an operand of the build vector. By looking at build_vectors through a shuffle we fix this particular situation, preventing a vector from being built, only to be deconstructed again (for the scalarized add) -- an expensive proposition when this all needs to be done via the stack. We probably want a more comprehensive fix here where we look back recursively through any shuffles to any build_vectors or scalar_to_vectors, etc. but that can come later. llvm-svn: 205179
*	Make use of previously generated stores in ↵	Hal Finkel	2014-03-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAGLegalize::ExpandExtractFromVectorThroughStack When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire vector and then load the piece we want. This is fine in isolation, but generating a new store (and corresponding stack slot) for each extraction ends up producing code of poor quality. When we scalarize a vector operation (using SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT for each element in the vector. This used to generate one stored copy of the vector for each element in the vector. Now we search the uses of the vector for a suitable store before generating a new one, which results in much more efficient scalarization code. llvm-svn: 205153
*	[PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG	Hal Finkel	2014-03-30	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by converting two and from v2i64. The small trick necessary here is to shift the i32 elements into the right lanes before the i32 -> f64 step. This is because of the big Endian nature of the system, we need the i32 portion in the high word of the i64 elements. For v2i16 and v2i8 we can do the same, but we first use the default Altivec shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and then apply the above procedure. llvm-svn: 205146
*	[PowerPC] Handle v2i64 comparisons	Hal Finkel	2014-03-29	1	-0/+33
\| \| \| \| \| \| \| \|	v2i64 is a legal type under VSX, however we don't have native vector comparisons. We can handle eq/ne by casting it to an Altivec type, but everything else must be expanded. llvm-svn: 205106
*	[PowerPC] Fix VSX permutation isel	Hal Finkel	2014-03-28	1	-1/+1
\| \| \| \| \| \| \|	Not only did I invert the indices when I wrote the code, but I also did the same thing when I wrote the regression test. Oops. llvm-svn: 205046
*	[PowerPC] Fix v2f64 vector extract and related patterns	Hal Finkel	2014-03-27	1	-0/+18
\| \| \| \| \| \| \| \| \|	First, v2f64 vector extract had not been declared legal (and so the existing patterns were not being used). Second, the patterns for that, and for scalar_to_vector, should really be a regclass copy, not a subregister operation, because the VSX registers directly hold both the vector and scalar data. llvm-svn: 204971
*	[PowerPC] Expand v2i64 shifts	Hal Finkel	2014-03-27	1	-0/+42
\| \| \| \| \| \| \| \|	These operations need to be expanded during legalization so that isel does not crash. In theory, we might be able to custom lower some of these. That, however, would need to be follow-up work. llvm-svn: 204963
*	[PowerPC] Generate VSX permutations for v2[fi]64 vectors	Hal Finkel	2014-03-26	1	-0/+65
\| \| \| \|	llvm-svn: 204873
*	[PowerPC] VSX loads and stores support unaligned access	Hal Finkel	2014-03-26	1	-0/+18
\| \| \| \| \| \| \|	I've not yet updated PPCTTI because I'm not sure what the actual relative cost is compared to the aligned uses. llvm-svn: 204848
*	[PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions	Hal Finkel	2014-03-26	1	-0/+72
\| \| \| \|	llvm-svn: 204843
*	[PowerPC] Use VSX vector load/stores for v2[fi]64	Hal Finkel	2014-03-26	1	-0/+36
\| \| \| \| \| \| \| \|	These instructions have access to the complete VSX register file. In addition, they "swap" the order of the elements so that element 0 (the scalar part) comes first in memory and element 1 follows at a higher address. llvm-svn: 204838
*	[PowerPC] Add v2i64 as a legal VSX type	Hal Finkel	2014-03-26	1	-1/+22
\| \| \| \| \| \| \| \| \|	v2i64 needs to be a legal VSX type because it is the SetCC result type from v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations. This fixes the lowering for v2f64 VSELECT. llvm-svn: 204828
*	[PowerPC] Lower VSELECT using xxsel when VSX is available	Hal Finkel	2014-03-26	1	-0/+77
\| \| \| \| \| \| \| \|	With VSX there is a real vector select instruction, and so we should use it. Note that VSELECT will still scalarize for v2f64 because the corresponding SetCC result type (v2i64) is not currently a legal type. llvm-svn: 204801