summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCInstrAltivec.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computationsBill Schmidt2015-04-271-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new SSA MI pass that runs on little-endian PPC64 code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors without alignment constraints are accomplished for little-endian using lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional xxswapd instructions hurts performance in comparison with big-endian code, but they are necessary in the general case to support correct semantics. However, the general case does not apply to most vector code. Many vector instructions are lane-insensitive; they do not "care" which lanes the parallel computations are performed within, provided that the resulting data is stored into the correct locations. Thus this pass looks for computations that perform only lane-insensitive operations, and remove the unnecessary swaps from loads and stores in such computations. Future improvements will allow computations using certain lane-sensitive operations to also be optimized in this manner, by modifying the lane-sensitive operations to account for the permuted order of the lanes. However, this patch only adds the infrastructure to permit this; no lane-sensitive operations are optimized at this time. This code is heavily exercised by the various vectorizing applications in the projects/test-suite tree. For the time being, I have only added one simple test case to demonstrate what the pass is doing. Although it is quite simple, it provides coverage for much of the code, including the special case handling of copies and subreg-to-reg operations feeding the swaps. I plan to add additional tests in the future as I fill in more of the "special handling" code. Two existing tests were affected, because they expected the swaps to be present, but they are now removed. llvm-svn: 235910
* [PowerPC] Correct typo in PPCInstrAltivec.tdBill Schmidt2015-03-181-1/+1
| | | | llvm-svn: 232681
* [PowerPC] Remove canFoldAsLoad from instruction definitionsHal Finkel2015-03-111-1/+1
| | | | | | | | | | | | | | | | | The PowerPC backend had a number of loads that were marked as canFoldAsLoad (and I'm partially at fault here for copying around the relevant line of TableGen definitions without really looking at what it meant). This is not right; PPC (non-memory) instructions don't support direct memory operands, and so there is nothing a 'foldable' instruction could be folded into. Noticed by inspection, no test case. The one thing we might lose by doing this is ability to fold some loads into stackmap/patchpoint pseudo-instructions. However, this was untested, and would not obviously have worked for extending loads, and I'd rather re-add support for that once it can be tested. llvm-svn: 231982
* Change the generation of the vmuluwm instruction to be based on the MUL opcode.Kit Barton2015-03-101-2/+3
| | | | | | Phabricator review: http://reviews.llvm.org/D8185 llvm-svn: 231827
* While reviewing the changes to Clang to add builtin support for the vsld, ↵Kit Barton2015-03-051-4/+10
| | | | | | vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics. llvm-svn: 231378
* Add LLVM support for PPC cryptography builtinsNemanja Ivanovic2015-03-041-0/+40
| | | | | | Review: http://reviews.llvm.org/D7955 llvm-svn: 231285
* Add the following 64-bit vector integer arithmetic instructions added in POWER8:Kit Barton2015-03-031-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | vaddudm vsubudm vmulesw vmulosw vmuleuw vmulouw vmuluwm vmaxsd vmaxud vminsd vminud vcmpequd vcmpequd. vcmpgtsd vcmpgtsd. vcmpgtud vcmpgtud. vrld vsld vsrd vsrad Phabricator review: http://reviews.llvm.org/D7959 llvm-svn: 231115
* I incorrectly marked the VORC instruction as isCommutable when I added it. Kit Barton2015-02-201-1/+2
| | | | | | | | This fix removes the VORC instruction definition from the isCommutable block. Phabricator review: http://reviews.llvm.org/D7772 llvm-svn: 230020
* This patch adds the VSX logical instructions introduced in the Power ISA ↵Kit Barton2015-02-181-2/+0
| | | | | | | | | | 2.07. It also removes the added complexity that favors VMX versions of the three instructions. Phabricator review: http://reviews.llvm.org/D7616 Commiting on Nemanja's behalf. llvm-svn: 229694
* This change implements the following three logical vector operations:Kit Barton2015-02-091-0/+25
| | | | | | | | | | | | veqv (vector equivalence) vnand vorc I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions. Phabricator review: http://reviews.llvm.org/D7469 llvm-svn: 228580
* [PowerPC] Implement the vclz instructions for PWR8Bill Schmidt2015-02-051-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector count leading zeros instruction for byte, halfword, word, and doubleword sizes. This is a fairly straightforward addition after the changes made for vpopcnt: 1. Add the correct definitions for the various instructions in PPCInstrAltivec.td 2. Make the CTLZ operation legal on vector types when using P8Altivec in PPCISelLowering.cpp Test Plan Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the instructions are being generated when the CTLZ operation is used in LLVM. Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively. llvm-svn: 228301
* [PowerPC] Implement the vpopcnt instructions for POWER8Bill Schmidt2015-02-031-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector population count instructions for byte, halfword, word, and doubleword sizes. There are two major changes here: PPCISelLowering.cpp: Make CTPOP legal for vector types. PPCRegisterInfo.td: Added v2i64 to the VRRC register definition. This is needed for the doubleword variations of the integer ops that were added in P8. Test Plan Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s Test the generation of the vpopcnt instructions for various vector data types. When adding the v2i64 type to the Vector Register set, I also needed to add the appropriate bit conversion patterns between v2i64 and the existing vector types. Testing for these conversions were also added in the test case by passing a different vector type as a parameter into the test functions. There is also a run step that will ensure the vpopcnt instructions are generated when the vsx feature is disabled. llvm-svn: 228046
* [PowerPC] Swap arguments and adjust shift count for vsldoi on little endianBill Schmidt2014-08-051-7/+20
| | | | | | | | | | | | | | Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923
* [PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endianUlrich Weigand2014-08-041-8/+22
| | | | | | | | | | | | | In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl* by swapping the input arguments. As it turns out, the exact same fix is also required for the vpkuhum/vpkuwum patterns. This fixes another regression in llvmpipe when vector support is enabled. Reviewed by Bill Schmidt. llvm-svn: 214718
* Don't use additional arguments for dss and friends to satisfy DSS_Form,Joerg Sonnenberger2014-08-021-61/+53
| | | | | | | | | when let can do the same thing. Keep the 64bit variants as codegen-only. While they have a different register class, the encoding is the same for 32bit and 64bit mode. Having both present would otherwise confuse the disassembler. llvm-svn: 214636
* [PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.Bill Schmidt2014-07-251-24/+56
| | | | | | | | | | | | | | | | | | | | | | Because the PowerPC vmrgh* and vmrgl* instructions have a built-in big-endian bias, it is necessary to swap their inputs in little-endian mode when using them to implement a vector shuffle. This was previously missed in the vector LE implementation. There was already logic to distinguish between unary and "normal" vmrg* vector shuffles, so this patch extends that logic to use a third option: "swapped" vmrg* vector shuffles that are used for little endian in place of the "normal" ones. I've updated the vec-shuffle-le.ll test to check for the expected register ordering on the generated instructions. This bug was discovered when testing the LE and ELFv2 patches for safety if they were backported to 3.4. A different vectorization decision was made in 3.4 than on mainline trunk, and that exposed the problem. I've verified this fix takes care of that issue. llvm-svn: 213915
* [PPC64LE] Recognize shufflevector patterns for little endianBill Schmidt2014-06-101-23/+39
| | | | | | | | | | | | | | | | | Various masks on shufflevector instructions are recognizable as specific PowerPC instructions (vector pack, vector merge, etc.). There is existing code in PPCISelLowering.cpp to recognize the correct patterns for big endian code. The masks for these instructions are different for little endian code due to the big-endian numbering employed by these instructions. This patch adds the recognition code for little endian. I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for this. The existing recognizer test (vec_shuffle.ll) is unnecessarily verbose and difficult to read, so I felt it was better to add a new test rather than modify the old one. llvm-svn: 210536
* Reset the subtarget for DAGToDAG on every iteration of runOnMachineFunction.Eric Christopher2014-05-221-1/+1
| | | | | | | This required updating the generated functions and TD file accordingly to be pointers rather than const references. llvm-svn: 209375
* [PowerPC] Mark many instructions as commutativeHal Finkel2014-03-241-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm under the impression that we used to infer the isCommutable flag from the instruction-associated pattern. Regardless, we don't seem to do this (at least by default) any more. I've gone through all of our instruction definitions, and marked as commutative all of those that should be trivial to commute (by exchanging the first two operands). There has been special code for the RL* instructions, and that's not changed. Before this change, we had the following commutative instructions: RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo XSADDDP XSMULDP XVADDDP XVADDSP XVMULDP XVMULSP After: ADD4 ADD4o ADD8 ADD8o ADDC ADDC8 ADDC8o ADDCo ADDE ADDE8 ADDE8o ADDEo AND AND8 AND8o ANDo CRAND CREQV CRNAND CRNOR CROR CRXOR EQV EQV8 EQV8o EQVo FADD FADDS FADDSo FADDo FMADD FMADDS FMADDSo FMADDo FMSUB FMSUBS FMSUBSo FMSUBo FMUL FMULS FMULSo FMULo FNMADD FNMADDS FNMADDSo FNMADDo FNMSUB FNMSUBS FNMSUBSo FNMSUBo MULHD MULHDU MULHDUo MULHDo MULHW MULHWU MULHWUo MULHWo MULLD MULLDo MULLW MULLWo NAND NAND8 NAND8o NANDo NOR NOR8 NOR8o NORo OR OR8 OR8o ORo RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo VADDCUW VADDFP VADDSBS VADDSHS VADDSWS VADDUBM VADDUBS VADDUHM VADDUHS VADDUWM VADDUWS VAND VAVGSB VAVGSH VAVGSW VAVGUB VAVGUH VAVGUW VMADDFP VMAXFP VMAXSB VMAXSH VMAXSW VMAXUB VMAXUH VMAXUW VMHADDSHS VMHRADDSHS VMINFP VMINSB VMINSH VMINSW VMINUB VMINUH VMINUW VMLADDUHM VMULESB VMULESH VMULEUB VMULEUH VMULOSB VMULOSH VMULOUB VMULOUH VNMSUBFP VOR VXOR XOR XOR8 XOR8o XORo XSADDDP XSMADDADP XSMAXDP XSMINDP XSMSUBADP XSMULDP XSNMADDADP XSNMSUBADP XVADDDP XVADDSP XVMADDADP XVMADDASP XVMAXDP XVMAXSP XVMINDP XVMINSP XVMSUBADP XVMSUBASP XVMULDP XVMULSP XVNMADDADP XVNMADDASP XVNMSUBADP XVNMSUBASP XXLAND XXLNOR XXLOR XXLXOR This is a by-inspection change, and I'm not sure how to write a reliable test case. I would like advice on this, however. llvm-svn: 204609
* Add IIC_ prefix to PPC instruction-class namesHal Finkel2013-11-271-78/+80
| | | | | | | | | | | | | This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890
* Mark PPC MFTB and DST (and friends) as deprecatedHal Finkel2013-09-121-10/+20
| | | | | | | | Use the new instruction deprecation feature to mark mftb (now replaced with mfspr) and dst (along with the other Altivec cache control instructions) as deprecated when targeting cores supporting at least ISA v2.03. llvm-svn: 190605
* PPC: Add some missing V_SET0 patternsHal Finkel2013-07-111-2/+15
| | | | | | | | | | We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for v8i16 (which occurs in the test case) or v16i8. The same was true for V_SETALLONES (so I added the associated patterns for those as well). Another bug found by llvm-stress. llvm-svn: 186108
* [PowerPC] Make specialized AltiVec patterns isCodeGenOnlyUlrich Weigand2013-07-031-2/+3
| | | | | | | | | | | A couple of AltiVec patterns are just specialized forms of the generic instruction pattern, and should therefore be marked isCodeGenOnly to avoid confusing the asm parser: VCFSX_0, VCTUXS_0, VCFUX_0, VCTSXS_0, and V_SETALLONES. Noticed by inspection of the generated PPCGenAsmMatcher.inc. llvm-svn: 185533
* PowerPC: Use RegisterOperand instead of RegisterClass operandsUlrich Weigand2013-04-261-72/+72
| | | | | | | | | | | | | | | | | In the default PowerPC assembler syntax, registers are specified simply by number, so they cannot be distinguished from immediate values (without looking at the opcode). This means that the default operand matching logic for the asm parser does not work, and we need to specify custom matchers. Since those can only be specified with RegisterOperand classes and not directly on the RegisterClass, all instructions patterns used by the asm parser need to use a RegisterOperand (instead of a RegisterClass) for all their register operands. This patch adds one RegisterOperand for each RegisterClass, using the same name as the class, just in lower case, and updates all instruction patterns to use RegisterOperand instead of RegisterClass operands. llvm-svn: 180611
* PowerPC: Fix encoding of vsubcuw and vsum4sbs instructionsUlrich Weigand2013-04-261-2/+2
| | | | | | | | | When testing the asm parser, I noticed wrong encodings for the above instructions (wrong sub-opcodes). Tests will be added together with the asm parser. llvm-svn: 180608
* PPC: Add a FIXME regarding the non-working fma+fneg Altivec patternHal Finkel2013-04-031-0/+2
| | | | llvm-svn: 178658
* More direct types in PowerPC AltiVec intrinsics.Ulrich Weigand2013-04-031-47/+29
| | | | | | | | | | This patch follows up on work done by Bill Schmidt in r178277, and replaces most of the remaining uses of VRRC in ISEL DAG patterns. The resulting .inc files are identical except for comments, so no change in code generation is expected. llvm-svn: 178656
* Use PPC reciprocal estimates with Newton iteration in fast-math modeHal Finkel2013-04-031-0/+3
| | | | | | | | | | | | | | | | | | | When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617
* Use direct types in most PowerPC Altivec instructions and patterns.Bill Schmidt2013-03-281-236/+333
| | | | | | | | | | | | | | | | | | | | | | | This follows up Ulrich Weigand's work in PPCInstrInfo.td and PPCInstr64Bit.td by doing the corresponding work for most of the Altivec patterns. I have not been able to do anything for the following classes of instructions: (1) Vector logicals. These don't have corresponding intrinsics and don't have a single obvious vector type. So far as I can tell I need to leave these as VRRC. Affected instructions are: VAND, VANDC, VNOR, VOR, VXOR, V_SET0. (2) Instructions that make use of vector shuffle. The selection code promotes all shuffles to v16i8, so any pattern that matches on a shuffle is constrained. I haven't found any way to make the patterns match on their natural types, so I plan to leave these as VRRC. Affected instructions are: VMRG*, VSPLTB, VSPLTH, VSPLTW, VPKUHUM, VPKUWUM. No change in behavior is anticipated. llvm-svn: 178277
* PowerPC: Mark patterns as isCodeGenOnly.Ulrich Weigand2013-03-261-0/+3
| | | | | | | | | | | There remain a number of patterns that cannot (and should not) be handled by the asm parser, in particular all the Pseudo patterns. This commit marks those patterns as isCodeGenOnly. No change in generated code. llvm-svn: 178008
* Protect PPC Altivec patterns with a predicateHal Finkel2013-03-151-0/+6
| | | | | | | | | | | | In preparation for the addition of other SIMD ISA extensions (such as QPX) we need to make sure that all Altivec patterns are properly predicated on having Altivec support. No functionality change intended (one test case needed to be updated b/c it assumed that Altivec intrinsics would be supported without enabling Altivec support). llvm-svn: 177152
* This patch fixes the Altivec addend construction for the fused multiply-addAdhemerval Zanella2012-11-301-5/+7
| | | | | | | | | | | | | | | instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero result when resulting product is -0.0. The -0.0 vector addend to vmaddfp is generated by a creating a vector with full bits sets and then shifting each elements by 31-bits to the left, resulting in a vector of 0x80000000 (or -0.0 as float). The 'buildvec_canonicalize.ll' was adjusted to reflect this change and the 'vec_mul.ll' was complemented with the float vector multiplication test. llvm-svn: 168998
* PowerPC: Lowering floor intrinsic for AltivecAdhemerval Zanella2012-11-151-0/+10
| | | | | | | | This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and llvm.nearbyint to Altivec instruction when using 4 single-precision float vectors. llvm-svn: 168086
* Add floating-point to and from integer conversionAdhemerval Zanella2012-10-081-0/+32
| | | | | | | This patch add altivec support for v4i32 to v4f32 and for v4f32 to v4i32 vector rounding conversion. llvm-svn: 165409
* Convert the PPC backend to use the new FMA infrastructure.Hal Finkel2012-06-221-7/+3
| | | | | | | The existing contraction patterns are replaced with fma/fneg. Overall functionality should be the same. llvm-svn: 158955
* Split the LdStGeneral PPC itin. class into LdStLoad and LdStStore.Hal Finkel2012-04-011-24/+24
| | | | | | | | | | | Loads and stores can have different pipeline behavior, especially on embedded chips. This change allows those differences to be expressed. Except for the 440 scheduler, there are no functionality changes. On the 440, the latency adjustment is only by one cycle, and so this probably does not affect much. Nevertheless, it will make a larger difference in the future and this removes a FIXME from the 440 itin. llvm-svn: 153821
* Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, ↵Jia Liu2012-02-181-3/+3
| | | | | | MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878
* fix up vnot matching, eliminating a dead pattern, correcting a couple ofChris Lattner2010-03-281-6/+11
| | | | | | | patterns that would never match because of bitcast, and eliminating use of vnot_conv. llvm-svn: 99753
* Fix a bunch of ambiguous patterns which tblgen happens to infer typesChris Lattner2010-03-081-7/+7
| | | | | | for, due to a bug. llvm-svn: 97953
* PR3628: Add patterns to match SHL/SRL/SRA to the corresponding Altivec Eli Friedman2009-06-071-0/+22
| | | | | | instructions. llvm-svn: 73009
* 2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.Nate Begeman2009-04-271-82/+97
| | | | | | | | | | | | | | PR2957 ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. llvm-svn: 70225
* Revert 69952. Causes testsuite failures on linux x86-64.Rafael Espindola2009-04-241-97/+82
| | | | llvm-svn: 69967
* PR2957Nate Begeman2009-04-241-82/+97
| | | | | | | | | | | | | | ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next. llvm-svn: 69952
* Rename isSimpleLoad to canFoldAsLoad, to better reflect its meaning.Dan Gohman2008-12-031-1/+1
| | | | llvm-svn: 60487
* erect abstraction boundaries for accessing SDValue members, rename Val -> ↵Gabor Greif2008-08-281-3/+3
| | | | | | Node to reflect semantics llvm-svn: 55504
* Replace all target specific implicit def instructions with a target ↵Evan Cheng2008-03-151-8/+0
| | | | | | independent one: TargetInstrInfo::IMPLICIT_DEF. llvm-svn: 48380
* rename isLoad -> isSimpleLoad due to evan's desire to have such a predicate.Chris Lattner2008-01-061-1/+1
| | | | llvm-svn: 45667
* remove some isStore flags that are now inferred automatically.Chris Lattner2008-01-061-1/+1
| | | | llvm-svn: 45652
* Remove attribution from file headers, per discussion on llvmdev.Chris Lattner2007-12-291-2/+2
| | | | llvm-svn: 45418
* Add the 64-bit versions of the DS* Altivec instructions.Bill Wendling2007-09-051-14/+45
| | | | llvm-svn: 41717
OpenPOWER on IntegriCloud