bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix the PPC CTR Loop pass to look for calls to the intrinsics that	Eric Christopher	2015-09-08	1	-0/+349
\| \| \| \| \| \|	read CTR and count them as reading the CTR. llvm-svn: 247083
*	[PowerPC] Don't commute trivial rlwimi instructions	Hal Finkel	2015-09-06	1	-0/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To commute a trivial rlwimi instructions (meaning one with a full mask and zero shift), we'd need to ability to form an all-zero mask (instead of an all-one mask) using rlwimi. We can't represent this, however, and we'll miscompile code if we try. The code quality problem that this highlights (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS) will be fixed as a follow-up. Fixes PR24719. llvm-svn: 246937
*	[PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation	Hal Finkel	2015-09-05	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from an input pattern that looks like this: and(or(x, c1), c2) but the associated logic does not work if there are bits that are 1 in c1 but 0 in c2 (these are normally canonicalized away, but that can't happen if the 'or' has other users. Make sure we abort the transformation if such bits are discovered. Fixes PR24704. llvm-svn: 246900
*	[PowerPC] Try harder to find a base+offset when looking for consecutive accesses	Hal Finkel	2015-09-03	1	-0/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When forming permutation-based unaligned vector loads, we need to know whether it is valid to read ahead of the requested address by a full vector length. Doing so is more efficient (and allows for more CSE with later loads), but could trigger a page fault if invalid. To determine validity, we look for other loads in the same block that access the relevant address range. The relevant point here is that we need to do this as part of the process of forming permutation-based vector loads, and this happens quite early in the SDAG pipeline - specifically before many of the address calculations are fully canonicalized. As a result, we need to try harder to recognize base+offset address computations, because they still might appear as chain of adds (base+offset+offset, for example). To account for this, we'll look through chains of adds, accumulating the constant offsets. llvm-svn: 246813
*	[PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic	Hal Finkel	2015-09-03	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \|	If you compute the MMO offset using unsigned arithmetic, you end up with a large positive offset instead of a small negative one. In theory, this could cause bad instruction-scheduling decisions later. I noticed this by inspection from the debug output, and using that for the regression test is the best I can do right now. llvm-svn: 246805
*	[PowerPC] Cleanup cost model for unaligned vector loads/stores	Hal Finkel	2015-09-02	1	-0/+580
\| \| \| \| \| \| \| \| \| \|	I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. llvm-svn: 246712
*	[PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE	Hal Finkel	2015-09-02	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \|	LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the TableGen-generated matching code, and it does this by testing the same predicates used by the TableGen files. Unfortunately, when we added new P8Altivec-only predicates, we started universally testing them in LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8, we'd end up with a selection failure. llvm-svn: 246675
*	[DAGCombine] Fixup SETCC legality checking	Hal Finkel	2015-08-31	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and\|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and\|or node are actual setcc nodes, then this is not an issue (because the and\|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 246507
*	[AggressiveAntiDepBreaker] Check for EarlyClobber on defining instruction	Hal Finkel	2015-08-31	1	-0/+117
\| \| \| \| \| \| \| \| \|	AggressiveAntiDepBreaker was doing some EarlyClobber checking, but was not checking that the register being potentially renamed was defined by an early-clobber def where there was also a use, in that instruction, of the register being considered as the target of the rename. Fixes PR24014. llvm-svn: 246423
*	[PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands	Hal Finkel	2015-08-30	1	-0/+1685
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were really two problems here. The first was that we had the truth tables for signed i1 comparisons backward. I imagine these are not very common, but if you have: setcc i1 x, y, LT this has the '0 1' and the '1 0' results flipped compared to: setcc i1 x, y, ULT because, in the signed case, '1 0' is really '-1 0', and the answer is not the same as in the unsigned case. The second problem was that we did not have patterns (at all) for the unsigned comparisons select_cc nodes for i1 comparison operands. This was the specific cause of PR24552. These had to be added (and a missing Altivec promotion added as well) to make sure these function for all types. I've added a bunch more test cases for these patterns, and there are a few FIXMEs in the test case regarding code-quality. Fixes PR24552. llvm-svn: 246400
*	[PowerPC/MIR Serialization] Target flags serialization support	Hal Finkel	2015-08-30	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for MIR serialization of PowerPC-specific operand target flags (based on the generic infrastructure added in r244185 and r245383). I won't even pretend that this is good test coverage, but this includes the regression test associated with r246372. Adding an MIR test for that fix is far superior to adding an IR-level test because particular instruction-scheduling decisions are necessary in order to expose the bug, and using an MIR test we can start the pipeline post-scheduling. llvm-svn: 246373
*	DI: Require subprogram definitions to be distinct	Duncan P. N. Exon Smith	2015-08-28	4	-32/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram$.* isDefinition: true' \| grep -v test/Bitcode \| xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true$/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327
*	Make MergeConsecutiveStores look at other stores on same chain	Matt Arsenault	2015-08-28	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When combiner AA is enabled, look at stores on the same chain. Non-aliasing stores are moved to the same chain so the existing code fails because it expects to find an adajcent store on a consecutive chain. Because of how DAGCombiner tries these store combines, MergeConsecutiveStores doesn't see the correct set of stores on the chain when it visits the other stores. Each store individually has its chain fixed before trying to merge consecutive stores, and then tries to merge stores from that point before the other stores have been processed to have their chains fixed. To fix this, attempt to use FindBetterChain on any possibly neighboring stores in visitSTORE. Suppose you have 4 32-bit stores that should be merged into 1 vector store. One store would be visited first, fixing the chain. What happens is because not all of the store chains have yet been fixed, 2 of the stores are merged. The other 2 stores later have their chains fixed, but because the other stores were already merged, they have different memory types and merging the two different sized stores is not supported and would be more difficult to handle. llvm-svn: 246307
*	[PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends	Hal Finkel	2015-08-24	1	-0/+38
\| \| \| \| \| \| \| \|	We might end up with a trivial copy as the addend, and if so, we should ignore the corresponding FMA instruction. The trivial copy can be coalesced away later, so there's nothing to do here. We should not, however, assert. Fixes PR24544. llvm-svn: 245907
*	[PPC64LE] Fix PR24546 - Swap optimization and debug values	Bill Schmidt	2015-08-24	1	-0/+116
\| \| \| \| \| \| \| \| \| \|	This patch fixes PR24546, which demonstrates a segfault during the VSX swap removal pass. The problem is that debug value instructions were not excluded from the list of instructions to be analyzed for webs of related computation. I've added the test case from the PR as a crash test in test/CodeGen/PowerPC. llvm-svn: 245862
*	[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers	Hal Finkel	2015-08-21	1	-0/+33
\| \| \| \| \| \| \| \| \| \|	When PPCVSXFMAMutate would look at the input addend register, it would get its input value number. This would fail, however, if the register was undef, causing a segfault. Don't segfault (just skip such FMA instructions). Fixes the test case from PR24542 (although that may have been over-reduced). llvm-svn: 245741
*	[PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons	Hal Finkel	2015-08-20	1	-0/+38
\| \| \| \| \| \| \| \| \|	XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs to be v2i64 (as that's the corresponding SETCC type). Fixes PR24225. llvm-svn: 245535
*	[PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128	Hal Finkel	2015-08-20	1	-0/+16
\| \| \| \| \| \| \| \|	This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128 operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)), but shouldn't (it should only apply to f32/f64 types). The result was a crash. llvm-svn: 245530
*	Temporary fix for the self-host failures introduced by rL244921.	Nemanja Ivanovic	2015-08-19	2	-11/+11
\| \| \| \| \| \| \| \| \|	This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. I am working on resolving the issue, but in the meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64 and the associated testing until I can get this fixed. llvm-svn: 245481
*	Scalar to vector conversions using direct moves	Nemanja Ivanovic	2015-08-13	4	-23/+92
\| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D11471 It improves the code generated for converting a scalar to a vector value. With direct moves from GPRs to VSRs, we no longer require expensive stack operations for this. Subsequent patches will handle the reverse case and more general operations between vectors and their scalar elements. llvm-svn: 244921
*	StackMap: FastISel: Add an appropriate number of immediate operands to the	Alex Lorenz	2015-08-10	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	frame setup instruction. This commit ensures that the stack map lowering code in FastISel adds an appropriate number of immediate operands to the frame setup instruction. The previous code added just one immediate operand, which was fine for a target like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit operands. This caused the machine verifier to report an error when the old code added just one. Reviewers: Juergen Ributzka Differential Revision: http://reviews.llvm.org/D11853 llvm-svn: 244508
*	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI	Jonathan Roelofs	2015-08-10	4	-11/+11
\| \| \| \| \| \| \|	I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481
*	[MachineCombiner] Don't use the opcode-only form of computeInstrLatency	Hal Finkel	2015-08-05	1	-0/+25
\| \| \| \| \| \| \| \| \|	In r242277, I updated the MachineCombiner to work with itineraries, but I missed a call that is scheduling-model-only (the opcode-only form of computeInstrLatency). Using the form that takes an MI* allows this to work with itineraries (and should be NFC for subtargets with scheduling models). llvm-svn: 244020
*	DI: Disallow uniquable DICompileUnits	Duncan P. N. Exon Smith	2015-08-03	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test \| grep -v test/Bitcode \| xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885
*	DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable	Duncan P. N. Exon Smith	2015-07-31	2	-155/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags, using `DW_TAG_variable` in their place Stop exposing the `tag:` field at all in the assembly format for `DILocalVariable`. Most of the testcase updates were generated by the following sed script: find test/ -name ".ll" -o -name ".mir" \| xargs grep -l 'DILocalVariable' \| xargs sed -i '' \ -e 's/tag: DW_TAG_arg_variable, //' \ -e 's/tag: DW_TAG_auto_variable, //' There were only a handful of tests in `test/Assembly` that I needed to update by hand. (Note: a follow-up could change `DILocalVariable::DILocalVariable()` to set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable` (as appropriate), instead of having that logic magically in the backend in `DbgVariable`. I've added a FIXME to that effect.) llvm-svn: 243774
*	[PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask	Bill Schmidt	2015-07-29	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Given certain shuffle-vector masks, LLVM emits splat instructions which splat the wrong bytes from the source register. The issue is that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp does not ensure that the splat pattern found is requesting bytes that are aligned on an EltSize boundary. This patch detects this situation as not a valid splat mask, resulting in a permute being generated instead of a splat. Patch and test case by Tyler Kenney, cleaned up a bit by me. This is a simple bug fix that would be good to incorporate into 3.7. llvm-svn: 243519
*	Fix typo.	Chih-Hung Hsieh	2015-07-28	1	-1/+1
\| \| \| \|	llvm-svn: 243475
*	Limit this test only on linux.	Chih-Hung Hsieh	2015-07-28	1	-2/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243474
*	Move unit tests to target specific directories.	Chih-Hung Hsieh	2015-07-28	1	-0/+41
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243454
*	Fix PPCMaterializeInt to check the size of the integer based on the	Eric Christopher	2015-07-25	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	extension property we're requesting - zero or sign extended. This fixes cases where we want to return a zero extended 32-bit -1 and not be sign extended for the entire register. Also updated the already out of date comment with the current behavior. llvm-svn: 243192
*	Clean up function attributes on PPC fast-isel tests.	Eric Christopher	2015-07-24	16	-145/+145
\| \| \| \|	llvm-svn: 243079
*	[PPC64LE] More vector swap optimization TLC	Bill Schmidt	2015-07-21	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes one substantive change and a few stylistic changes to the VSX swap optimization pass. The substantive change is to permit LXSDX and LXSSPX instructions to participate in swap optimization computations. The previous change to insert a swap following a SUBREG_TO_REG widening operation makes this almost trivial. I experimented with also permitting STXSDX and STXSSPX instructions. This can be done using similar techniques: we could insert a swap prior to a narrowing COPY operation, and then permit these stores to participate. I prototyped this, but discovered that the pattern of a narrowing COPY followed by an STXSDX does not occur in any of our test-suite code. So instead, I added commentary indicating that this could be done. Other TLC: - I changed SH_COPYSCALAR to SH_COPYWIDEN to more clearly indicate the direction of the copy. - I factored the insertion of swap instructions into a separate function. Finally, I added a new test case to check that the scalar-to-vector loads are working properly with swap optimization. llvm-svn: 242838
*	Add missing test for r242296 (vec_sld)	Bill Schmidt	2015-07-20	1	-1/+1
\| \| \| \|	llvm-svn: 242680
*	[PowerPC] v4i32 is a VSRCRegClass	Bill Schmidt	2015-07-16	2	-39/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was looking at some vector code generation and kept seeing unnecessary vector copies into the Altivec half of the VSX registers. I discovered that we overlooked v4i32 when adding the register classes for VSX; we only added v4f32 and v2f64. This means that anything that canonicalizes into v4i32 (which is a LOT of stuff) ends up being forced into VRRC on its way to VSRC. The fix is one line. The rest of the patch is fixing up some test cases whose code generation has changed as a result. This seems like it would be a good candidate for backport to 3.7. llvm-svn: 242442
*	[PowerPC] Use the MachineCombiner to reassociate fadd/fmul	Hal Finkel	2015-07-15	1	-0/+188
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is a direct port of the code from the X86 backend (r239486/r240361), which uses the MachineCombiner to reassociate (floating-point) adds/muls to increase ILP, to the PowerPC backend. The rationale is the same. There is a lot of copy-and-paste here between the X86 code and the PowerPC code, and we should extract at least some of this into CodeGen somewhere. However, I don't want to do that until this code is enhanced to handle FMAs as well. After that, we'll be in a better position to extract the common parts. llvm-svn: 242279
*	[PowerPC] Support symbolic targets in patchpoints	Hal Finkel	2015-07-14	1	-0/+15
\| \| \| \| \| \| \|	Follow-up r235483, with the corresponding support in PPC. We use a regular call for symbolic targets (because they're much cheaper than indirect calls). llvm-svn: 242239
*	[PowerPC] Use the ABI indirect-call protocol for patchpoints	Hal Finkel	2015-07-14	3	-19/+31
\| \| \| \| \| \| \| \| \| \| \| \|	We used to take the address specified as the direct target of the patchpoint and did no TOC-pointer handling. This, however, as not all that useful, because MCJIT tends to create a lot of modules, and they have their own TOC sections. Thus, to call from the generated code to other generated code, you really need to switch TOC pointers. Make this work as expected, and under ELFv1, tread the address as the function descriptor address so that the correct TOC pointer can be loaded. llvm-svn: 242217
*	[PowerPC] Fix the PPCInstrInfo::getInstrLatency implementation	Hal Finkel	2015-07-14	7	-34/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PowerPC uses itineraries to describe processor pipelines (and dispatch-group restrictions for P7/P8 cores). Unfortunately, the target-independent implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that looks for the largest cycle count in the pipeline for any given instruction. This, however, yields the wrong answer for the PPC itineraries, because we don't encode the full pipeline. Because the functional units are fully pipelined, we only model the initial stages (there are no relevant hazards in the later stages to model), and so the technique employed by getStageLatency does not really work. Instead, we should take the maximum output operand latency, and that's what PPCInstrInfo::getInstrLatency now does. This caused some test-case churn, including two unfortunate side effects. First, the new arrangement of copies we get from function parameters now sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the test cases), and we have one significant test-suite regression: SingleSource/Benchmarks/BenchmarkGame/spectral-norm 56.4185% +/- 18.9398% In this benchmark we have a loop with a vectorized FP divide, and it with the new scheduling both divides end up in the same dispatch group (which in this case seems to cause a problem, although why is not exactly clear). The grouping structure is hard to predict from the bottom of the loop, and there may not be much we can do to fix this. Very few other test-suite performance effects were really significant, but almost all weakly favor this change. However, in light of the issues highlighted above, I've left the old behavior available via a command-line flag. llvm-svn: 242188
*	Add missing builtins to the PPC back end for ABI compliance (vol. 4)	Nemanja Ivanovic	2015-07-14	1	-0/+30
\| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D11183 Back end portion of the fourth round of additions to altivec.h. llvm-svn: 242167
*	[PPC64LE] More improvements to VSX swap optimization	Bill Schmidt	2015-07-13	2	-2/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows VSX swap optimization to succeed more frequently. Specifically, it is concerned with common code sequences that occur when copying a scalar floating-point value to a vector register. This patch currently handles cases where the floating-point value is already in a register, but does not yet handle loads (such as via an LXSDX scalar floating-point VSX load). That will be dealt with later. A typical case is when a scalar value comes in as a floating-point parameter. The value is copied into a virtual VSFRC register, and then a sequence of SUBREG_TO_REG and/or COPY operations will convert it to a full vector register of the class required by the context. If this vector register is then used as part of a lane-permuted computation, the original scalar value will be in the wrong lane. We can fix this by adding a swap operation following any widening SUBREG_TO_REG operation. Additional COPY operations may be needed around the swap operation in order to keep register assignment happy, but these are pro forma operations that will be removed by coalescing. If a scalar value is otherwise directly referenced in a computation (such as by one of the many XS* vector-scalar operations), we currently disable swap optimization. These operations are lane-sensitive by definition. A MentionsPartialVR flag is added for use in each swap table entry that mentions a scalar floating-point register without having special handling defined. A common idiom for PPC64LE is to convert a double-precision scalar to a vector by performing a splat operation. This ensures that the value can be referenced as V[0], as it would be for big endian, whereas just converting the scalar to a vector with a SUBREG_TO_REG operation leaves this value only in V[1]. A doubleword splat operation is one form of an XXPERMDI instruction, which takes one doubleword from a first operand and another doubleword from a second operand, with a two-bit selector operand indicating which doublewords are chosen. In the general case, an XXPERMDI can be permitted in a lane-swapped region provided that it is properly transformed to select the corresponding swapped values. This transformation is to reverse the order of the two input operands, and to reverse and complement the bits of the selector operand (derivation left as an exercise to the reader ;). A new test case that exercises the scalar-to-vector and generalized XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll. The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to use CHECK-DAG instead of CHECK for two independent instructions that now appear in reverse order. There are two small unrelated changes that are added with this patch. First, the XXSLDWI instruction was incorrectly omitted from the list of lane-sensitive instructions; this is now fixed. Second, I observed that the same webs were being rejected over and over again for different reasons. Since it's sufficient to reject a web only once, I added a check for this to speed up the compilation time slightly. llvm-svn: 242081
*	[PowerPC] Make use of the TargetRecip system	Hal Finkel	2015-07-12	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	r238842 added the TargetRecip system for controlling use of reciprocal estimates for sqrt and division using a set of parameters that can be set by the frontend. Clang now supports a sophisticated -mrecip option, and this will allow that option to effectively control the relevant code-generation functionality of the PPC backend. llvm-svn: 241985
*	[PowerPC] Support the nest parameter attribute	Hal Finkel	2015-07-12	2	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the 'nest' attribute, which allows the static chain register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11 is the chain register (which the PPC64 ELF ABI calls the "environment pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded from the function descriptor, but providing an explicit 'nest' parameter will override that process and use the value provided. This allows __builtin_call_with_static_chain to work as expected on PowerPC. llvm-svn: 241984
*	Add missing builtins to the PPC back end for ABI compliance (vol. 2)	Nemanja Ivanovic	2015-07-05	1	-0/+31
\| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D10874 Back end portion of the second round of additions to altivec.h. llvm-svn: 241398
*	[PPC64LE] Remove implicit-subreg restriction from VSX swap removal	Bill Schmidt	2015-07-02	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r241285, I removed the SUBREG_TO_REG restriction from VSX swap removal, determining that this was overly conservative. We have another form of the same restriction in that we check for the presence of implicit subregs in vector operations. As with SUBREG_TO_REG for partial register conversions, an implicit subreg is safe in and of itself, provided no other operation makes a lane-sensitive assumption about the result. This patch removes that restriction, by removing the HasImplicitSubreg flag and all code that relies on it. I've added a test case that fails to optimize before this patch is applied, and optimizes properly with the patch. Test based on a report from Anton Blanchard. llvm-svn: 241290
*	[PPC64LE] Teach swap optimization about the doubleword splat idiom	Bill Schmidt	2015-07-02	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With a previous patch, the VSX swap optimization is able to recognize the doubleword load-splat idiom that can be implemented using lxvdsx. However, that does not cover a doubleword splat where the source is a register. We can implement this using xxspltd (a special form of xxpermdi). This patch teaches the swap optimization pass about this idiom. As a prerequisite, it also permits swap optimization to succeed for all forms of SUBREG_TO_REG. Previously we were conservative and only allowed SUBREG_TO_REG when it copied a full register. However, on reflection any form of SUBREG_TO_REG is safe in and of itself, so long as an unsafe operation is not performed on its result. In particular, a widening SUBREG_TO_REG often occurs as an input to a doubleword splat idiom, particularly in auto-vectorized code. The doubleword splat idiom is an XXPERMDI operation where both source registers are identical, and the selection mask is either 0 (splat the first element) or 3 (splat the second element). To determine whether the registers are identical, we use the existing mechanism for looking through "copy-like" operations. That mechanism has a side effect of marking the XXPERMDI operation as using a physical register, which would invalidate its presence in a swap-optimized region. This is correct for the form of XXPERMDI that performs a swap and hence would be removed, but is not what we want for a doubleword-splat variety of XXPERMDI. Therefore we reset the physical-register flag on the XXPERMDI when it represents a splat. A simple test case is added to verify that we generate the splat and that we also remove the xxswapd instructions that would otherwise be associated with the load and store of another operand. llvm-svn: 241285
*	[PPC64LE] Enable missing lxvdsx optimization, and related swap optimization	Bill Schmidt	2015-07-01	1	-6/+288
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When adding little-endian vector support for PowerPC last year, I inadvertently disabled an optimization that recognizes a load-splat idiom and generates the lxvdsx instruction. This patch moves the offending logic so lxvdsx is once again generated. This pattern is frequently generated by the vectorizer for scalar loads of an effective constant. Previously the lxvdsx instruction was wrongly listed as lane-sensitive for the VSX swap optimization (since both doublewords are identical, swaps are safe). This patch fixes this as well, so that vectorized code using lxvdsx can now have swaps removed from the computation. There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll that checks for the missing optimization. However, vsx.ll was only being tested for POWER7 with big-endian code generation. I've added a little-endian RUN statement and expected LE code generation for all the tests in vsx.ll to give us a bit better VSX coverage, including what's needed for this patch. llvm-svn: 241183
*	Fixes a bug with __builtin_vsx_lxvdw4x on Little Endian systems	Nemanja Ivanovic	2015-06-30	1	-0/+25
\| \| \| \|	llvm-svn: 241108
*	Add missing builtins to the PPC back end for ABI compliance (vol. 1)	Nemanja Ivanovic	2015-06-26	1	-0/+165
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D10638 This is the back end portion of patch http://reviews.llvm.org/D10637 It just adds the code gen and intrinsic functions necessary to support that patch to the back end. llvm-svn: 240820
*	[PPC] Implement vmrgew and vmrgow instructions	Kit Barton	2015-06-25	1	-0/+101
\| \| \| \| \| \| \| \| \|	This patch adds support for the vector merge even word and vector merge odd word instructions introduced in POWER8. Phabricator review: http://reviews.llvm.org/D10704 llvm-svn: 240650
*	Improve the --expand-relocs handling of MachO.	Rafael Espindola	2015-06-18	1	-29/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a relocation target can take 3 basic forms * A r_value in scattered relocations. * A symbol in external relocations. * A section is non-external relocations. Have the dump reflect that. With this change we go from CHECK-NEXT: Extern: 0 CHECK-NEXT: Type: X86_64_RELOC_SUBTRACTOR (5) CHECK-NEXT: Symbol: 0x2 CHECK-NEXT: Scattered: 0 To just // CHECK-NEXT: Type: X86_64_RELOC_SUBTRACTOR (5) // CHECK-NEXT: Section: __data (2) Since the relocation is with a section, we print the seciton name and don't need to say that it is not scattered or external. Someone motivated can add further special cases for things like ARM64_RELOC_ADDEND and ARM_RELOC_PAIR. llvm-svn: 240073