summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [PPC64LE] Generate correct code for unaligned little-endian vector loadsBill Schmidt2014-06-091-21/+39
| | | | | | | | | | | | | | | | | | | The code in PPCTargetLowering::PerformDAGCombine() that handles unaligned Altivec vector loads generates a lvsl followed by a vperm. As we've seen in numerous other places, the vperm instruction has a big-endian bias, and this is fixed for little endian by complementing the permute control vector and swapping the input operands. In this case the lvsl is providing the permute control vector. Rather than generating an lvsl and a complement operation, it is sufficient to generate an lvsr instruction instead. Thus for LE code generation we will generate an lvsr rather than an lvsl, and swap the other input arguments on the vperm. The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test the code generation for PPC64 and PPC64LE, in addition to the existing PPC32/G5 testing. llvm-svn: 210493
* [PPC64LE] Generate correct little-endian code for v16i8 multiplyBill Schmidt2014-06-091-4/+16
| | | | | | | | | | | | | | | | The existing code in PPCTargetLowering::LowerMUL() for multiplying two v16i8 values assumes that vector elements are numbered in big-endian order. For little-endian targets, the vector element numbering is reversed, but the vmuleub, vmuloub, and vperm instructions still assume big-endian numbering. To account for this, we must adjust the permute control vector and reverse the order of the input registers on the vperm instruction. The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed on powerpc64 and powerpc64le targets as well as the original powerpc (32-bit) target. llvm-svn: 210474
* [PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endianBill Schmidt2014-06-061-3/+34
| | | | | | | | | | | | | | | This patch fixes a couple of lowering issues for little endian PowerPC. The code for lowering BUILD_VECTOR contains a number of optimizations that are only valid for big endian. For now, we disable those optimizations for correctness. In the future, we will add analogous optimizations that are correct for little endian. When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to make the now-familiar transformation of swapping the input operands and complementing the permute control vector. Correctness of this transformation is tested by the accompanying test case. llvm-svn: 210336
* Omit else branch after return.Eric Christopher2014-06-021-2/+4
| | | | llvm-svn: 210034
* Have the TLOF creation take a Triple rather than needing a subtarget.Eric Christopher2014-05-311-3/+5
| | | | llvm-svn: 209937
* isSVR4ABI() returned !isDarwin() so just move that to the elseEric Christopher2014-05-301-4/+1
| | | | | | block and remove the unreachable code. llvm-svn: 209927
* Rename CreateTLOF->createTLOF to match the rest of the file and theEric Christopher2014-05-301-4/+4
| | | | | | rest of the targets with a similar function name. llvm-svn: 209926
* [PATCH] Correct type used for VADD_SPLAT optimization on PowerPCBill Schmidt2014-05-271-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there is an optimization for certain patterns to generate one or two vector splats followed by a vector add or subtract. This operation is represented by a VADD_SPLAT in the selection DAG. Prior to this patch, it was possible for the VADD_SPLAT to be assigned the wrong data type, causing incorrect code generation. This patch corrects the problem. Specifically, the code previously assigned the value type of the BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is correct much of the time, but not always. The problem is that the call to isConstantSplat() may return a SplatBitSize that is not the same as the number of bits in the original element vector type. The correct type to assign is a vector type with the same element bit size as SplatBitSize. The included test case shows an example of this, where the BUILD_VECTOR node has a type of v16i8. The vector to be built is {0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat detects that we can generate a splat of 16 for type v8i16, which is the type we must assign to the VADD_SPLAT node. If we do not, we generate a vspltisb of 8 and a vaddubm, which generates the incorrect result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16}. The correct code generation is a vspltish of 8 and a vadduhm. This patch also corrected code generation for CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked as an XFAIL, so we can remove the XFAIL from the test case. llvm-svn: 209662
* [PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16ImmediateAdam Nemet2014-05-201-1/+1
| | | | | | | | | | | | | The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64 backend. We matched an immediate offset with STWX8 even though it only supports register offset. The culprit is the complex-pattern predicate, SelectAddrIdx, which decides that if the offset is not ISD::Constant it must be a register. Many thanks to Bill Schmidt for testing this. llvm-svn: 209219
* SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the ↵Benjamin Kramer2014-05-191-0/+1
| | | | | | | | | | bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123
* Target: remove old constructors for CallLoweringInfoSaleem Abdulrasool2014-05-171-10/+5
| | | | | | | | | | This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082
* Rename ComputeMaskedBits to computeKnownBits. "Masked" has beenJay Foad2014-05-141-12/+12
| | | | | | inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811
* [PowerPC] Add global named register supportHal Finkel2014-05-111-0/+25
| | | | | | | Support for the intrinsics that read from and write to global named registers is added for r1, r2 and r13 (depending on the subtarget). llvm-svn: 208509
* Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I ↵Craig Topper2014-04-301-5/+4
| | | | | | introduced most of these recently. llvm-svn: 207616
* Convert more SelectionDAG functions to use ArrayRef.Craig Topper2014-04-281-1/+1
| | | | llvm-svn: 207397
* [C++] Use 'nullptr'.Craig Topper2014-04-281-2/+2
| | | | llvm-svn: 207394
* Convert SelectionDAG::getMergeValues to use ArrayRef.Craig Topper2014-04-271-4/+4
| | | | llvm-svn: 207374
* Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer ↵Craig Topper2014-04-261-7/+5
| | | | | | and size. llvm-svn: 207329
* Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.Craig Topper2014-04-261-35/+24
| | | | llvm-svn: 207327
* [C++] Use 'nullptr'. Target edition.Craig Topper2014-04-251-26/+27
| | | | llvm-svn: 207197
* Add 'musttail' marker to call instructionsReid Kleckner2014-04-241-0/+4
| | | | | | | | | | | | This is similar to the 'tail' marker, except that it guarantees that tail call optimization will occur. It also comes with convervative IR verification rules that ensure that tail call optimization is possible. Reviewers: nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D3240 llvm-svn: 207143
* Break PseudoSourceValue out of the Value hierarchy. It is now the root of ↵Nick Lewycky2014-04-151-2/+2
| | | | | | its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255
* [PowerPC] Implement some additional TLI callbacksHal Finkel2014-04-121-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | Add implementations of: bool isLegalICmpImmediate(int64_t Imm) const bool isLegalAddImmediate(int64_t Imm) const bool isTruncateFree(Type *Ty1, Type *Ty2) const bool isTruncateFree(EVT VT1, EVT VT2) const bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const Unfortunately, this regresses counter-register-based loop formation because some of the loops now end up in forms were SE cannot compute loop counts. However, nevertheless, the test-suite results favor committing: SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown llvm-svn: 206120
* [PowerPC] Don't return false from PPC::isVSLDOIShuffleMaskHal Finkel2014-04-081-1/+1
| | | | | | | | | PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle predicate should be false. Noticed by inspection; no test case (yet). llvm-svn: 205787
* Make consistent use of MCPhysReg instead of uint16_t throughout the tree.Craig Topper2014-04-041-24/+24
| | | | llvm-svn: 205610
* [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shufflesHal Finkel2014-03-311-0/+9
| | | | | | | If we have two unique values for a v2i64 build vector, this will always result in two vector loads if we expand using shuffles. Only one is necessary. llvm-svn: 205231
* [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREGHal Finkel2014-03-301-0/+32
| | | | | | | | | | | | | | | | sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by converting two and from v2i64. The small trick necessary here is to shift the i32 elements into the right lanes before the i32 -> f64 step. This is because of the big Endian nature of the system, we need the i32 portion in the high word of the i64 elements. For v2i16 and v2i8 we can do the same, but we first use the default Altivec shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and then apply the above procedure. llvm-svn: 205146
* [PowerPC] Handle v2i64 comparisonsHal Finkel2014-03-291-0/+23
| | | | | | | | v2i64 is a legal type under VSX, however we don't have native vector comparisons. We can handle eq/ne by casting it to an Altivec type, but everything else must be expanded. llvm-svn: 205106
* [PowerPC] Add subregister classes for f64 VSX valuesHal Finkel2014-03-291-4/+11
| | | | | | | | | | | | | We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte values even through we almost always had scalar 8-byte values. This resulted in an increase in stack-size use, extra memory bandwidth, etc. To fix this, I've added 64-bit subregisters of the Altivec registers, and combined those with the existing scalar floating-point registers to form a class of VSX scalar floating-point registers. The ABI code has also been enhanced to use this register class and some other necessary improvements have been made. llvm-svn: 205075
* [PowerPC] v2[fi]64 need to be explicitly passed in VSX registersHal Finkel2014-03-281-4/+28
| | | | | | | | v2[fi]64 values need to be explicitly passed in VSX registers. This is because the code in TRI that finds the minimal register class given a register and a value type will assert if given an Altivec register and a non-Altivec type. llvm-svn: 205041
* [PowerPC] Fix v2f64 vector extract and related patternsHal Finkel2014-03-271-0/+1
| | | | | | | | | First, v2f64 vector extract had not been declared legal (and so the existing patterns were not being used). Second, the patterns for that, and for scalar_to_vector, should really be a regclass copy, not a subregister operation, because the VSX registers directly hold both the vector and scalar data. llvm-svn: 204971
* [PowerPC] Expand v2i64 shiftsHal Finkel2014-03-271-0/+4
| | | | | | | | These operations need to be expanded during legalization so that isel does not crash. In theory, we might be able to custom lower some of these. That, however, would need to be follow-up work. llvm-svn: 204963
* [PowerPC] Generate VSX permutations for v2[fi]64 vectorsHal Finkel2014-03-261-4/+8
| | | | llvm-svn: 204873
* [PowerPC] VSX loads and stores support unaligned accessHal Finkel2014-03-261-3/+8
| | | | | | | I've not yet updated PPCTTI because I'm not sure what the actual relative cost is compared to the aligned uses. llvm-svn: 204848
* [PowerPC] Use v2f64 <-> v2i64 VSX conversion instructionsHal Finkel2014-03-261-0/+5
| | | | llvm-svn: 204843
* [PowerPC] Use VSX vector load/stores for v2[fi]64Hal Finkel2014-03-261-0/+8
| | | | | | | | These instructions have access to the complete VSX register file. In addition, they "swap" the order of the elements so that element 0 (the scalar part) comes first in memory and element 1 follows at a higher address. llvm-svn: 204838
* [PowerPC] Add v2i64 as a legal VSX typeHal Finkel2014-03-261-2/+11
| | | | | | | | | v2i64 needs to be a legal VSX type because it is the SetCC result type from v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations. This fixes the lowering for v2f64 VSELECT. llvm-svn: 204828
* [PowerPC] Lower VSELECT using xxsel when VSX is availableHal Finkel2014-03-261-0/+6
| | | | | | | | With VSX there is a real vector select instruction, and so we should use it. Note that VSELECT will still scalarize for v2f64 because the corresponding SetCC result type (v2i64) is not currently a legal type. llvm-svn: 204801
* [PowerPC] Initial support for the VSX instruction setHal Finkel2014-03-131-5/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
* Fixup PPC Darwin i1 argument handlingHal Finkel2014-03-061-0/+7
| | | | | | | Like on other targets, we need to zero_extend/truncate i1 args before copying them to GPRs. llvm-svn: 203045
* When using CR bit registers on PPC32, handle the i1 vaarg caseHal Finkel2014-03-061-0/+3
| | | | | | | | When copying an i1 value into a GPR for a vaarg call, we need to explicitly zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be generated). llvm-svn: 203041
* With PPC CR bit registers, handle int_to_fp on older coresHal Finkel2014-03-051-6/+16
| | | | | | | | | On cores without fpcvt support, we cannot promote int_to_fp i1 operations, because there is nothing to promote them to. The most straightforward implementation of this uses a select to choose between the two possible resulting floating-point values (and that's what is done here). llvm-svn: 203015
* Add a PPC inline asm constraint type for single CR bitsHal Finkel2014-03-021-0/+8
| | | | | | | | | | | | | | | | | Now that the PowerPC backend can track individual CR bits as first-class registers, we should also have a way of allocating them for inline asm statements. Because these registers are only one bit, if an output variable is implicitly cast to a larger integer size, we'll get an any_extend to that larger type (this is part of the existing target-independent logic). As a result, regardless of the size of the output type, only the first bit is meaningful. The constraint identifier "wc" has been chosen for this purpose. Although gcc does not currently support allocating individual CR bits, this identifier choice has been coordinated with the gcc PowerPC team, and will be marked as reserved for this purpose in the gcc constraints.md file. llvm-svn: 202657
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-11/+6
| | | | | | Remove the old functions. llvm-svn: 202636
* Remove extra truncs/exts around i32 bit operations on PPC64Hal Finkel2014-03-011-12/+82
| | | | | | | | | | | | | | | | | | | | | | | | | This generalizes the code to eliminate extra truncs/exts around i1 bit operations to also do the same on PPC64 for i32 bit operations. This eliminates a fairly prevalent code wart: int foo(int a) { return a == 5 ? 7 : 8; } On PPC64, because of the extension implied by the ABI, this would generate: cmplwi 0, 3, 5 li 12, 8 li 4, 7 isel 3, 4, 12, 2 rldicl 3, 3, 0, 32 blr where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At least for the single-BB case (which is all that the DAG combine mechanism can handle), this unnecessary extension is no longer generated. llvm-svn: 202600
* Trying to unbreak the darwin11 builderHal Finkel2014-02-281-0/+3
| | | | | | | | | The CR bit tracking code broke PPC/Darwin; trying to get it working again... (the darwin11 builder, which defaults to the darwin ABI when running PPC tests, asserted when running test/CodeGen/PowerPC/inverted-bool-compares.ll) llvm-svn: 202459
* Add CR-bit tracking to the PowerPC backend for i1 valuesHal Finkel2014-02-281-25/+663
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
* Add address space argument to allowsUnalignedMemoryAccess.Matt Arsenault2014-02-051-0/+1
| | | | | | | On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887
* Fix known typosAlp Toker2014-01-241-1/+1
| | | | | | | Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018
* Fix pointer info on PPC byval storesHal Finkel2014-01-211-6/+5
| | | | | | | | | | | For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters from registers into the stack frame had MachinePointerInfo objects with incorrect offsets. These offsets are relative to the object itself, not to the stack frame base. This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi. llvm-svn: 199763
OpenPOWER on IntegriCloud