summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Eliminate use of getNode that takes a vector.Chris Lattner2006-08-111-19/+22
| | | | llvm-svn: 29614
* Convert vectors to fixed sized arrays and smallvectors. Eliminate use of ↵Chris Lattner2006-08-111-37/+42
| | | | | | getNode that takes a vector. llvm-svn: 29609
* Fix miscompilation of float vector returns. Compile code to this:Chris Lattner2006-08-111-4/+4
| | | | | | | | | | | | | | | | | _func: vsldoi v2, v3, v2, 12 vsldoi v2, v2, v2, 4 blr instead of: _func: vsldoi v2, v3, v2, 12 vsldoi v2, v2, v2, 4 *** vor f1, v2, v2 blr llvm-svn: 29607
* Fix some ppc64 issues with vector code.Chris Lattner2006-07-281-4/+7
| | | | llvm-svn: 29384
* Rename RelocModel::PIC to PIC_, to avoid conflicts with -DPIC.Chris Lattner2006-07-261-3/+3
| | | | llvm-svn: 29307
* Implement Regression/CodeGen/PowerPC/bswap-load-store.ll by folding bswapsChris Lattner2006-07-101-0/+59
| | | | | | into i16/i32 load/stores. llvm-svn: 29089
* Implement 64-bit select, bswap, etc.Chris Lattner2006-06-271-0/+4
| | | | llvm-svn: 28935
* PPC doesn't have bit converts to/from i64Chris Lattner2006-06-271-0/+2
| | | | llvm-svn: 28932
* Implement 64-bit undef, sub, shl/shr, srem/uremChris Lattner2006-06-271-0/+2
| | | | llvm-svn: 28929
* Use i32 for shift amounts instead of i64. This gets bisort working.Chris Lattner2006-06-271-0/+1
| | | | llvm-svn: 28927
* Implement a bunch of 64-bit cleanliness work. With this, treeadd builds (butChris Lattner2006-06-271-1/+3
| | | | | | doesn't work right). llvm-svn: 28921
* Improve PPC64 calling convention supportChris Lattner2006-06-261-37/+84
| | | | llvm-svn: 28919
* Correct returns of 64-bit values, though they seemed to work before...Chris Lattner2006-06-211-9/+19
| | | | llvm-svn: 28892
* fix some assumptions that pointers can only be 32-bits. With this, we canChris Lattner2006-06-161-32/+35
| | | | | | | | | | | | | | | | | | | | | now compile: static unsigned long X; void test1() { X = 0; } into: _test1: lis r2, ha16(_X) li r3, 0 stw r3, lo16(_X)(r2) blr Totally amazing :) llvm-svn: 28839
* Rename some subtarget features. A CPU now can *have* 64-bit instructions,Chris Lattner2006-06-161-3/+3
| | | | | | can in 32-bit mode we can choose to optionally *use* 64-bit registers. llvm-svn: 28824
* Type of extract_element index operand should be iPTR.Evan Cheng2006-06-151-12/+15
| | | | llvm-svn: 28797
* Fix a problem exposed by the local allocator. CALL instructions are not markedChris Lattner2006-06-101-14/+18
| | | | | | | | | as using incoming argument registers, so the local allocator would clobber them between their set and use. To fix this, we give the call instructions a variable number of uses in the CALL MachineInstr itself, so live variables understands the live ranges of these register arguments. llvm-svn: 28744
* Always reserve space for 8 spilled GPRs. GCC apparently assumes that thisChris Lattner2006-05-301-12/+7
| | | | | | space will be available, even if the callee isn't varargs. llvm-svn: 28571
* Change RET node to include signness information of the return values. i.e.Evan Cheng2006-05-261-3/+3
| | | | | | RET chain, value1, sign1, value2, sign2, ... llvm-svn: 28510
* CALL node change (arg / sign pairs instead of just arguments).Evan Cheng2006-05-251-5/+6
| | | | llvm-svn: 28462
* Patches to make the LLVM sources more -pedantic clean. Patch providedChris Lattner2006-05-241-1/+1
| | | | | | by Anton Korobeynikov! This is a step towards closing PR786. llvm-svn: 28447
* Fix CodeGen/Generic/vector.ll:test_div with altivec.Chris Lattner2006-05-241-0/+1
| | | | llvm-svn: 28445
* Handle SETO* like we handle SET*, restoring behavior after Evan's setccChris Lattner2006-05-241-0/+8
| | | | | | change. This fixes PowerPC/fnegsel.ll. llvm-svn: 28443
* Make PPC call lowering more aggressive, making the isel matching code simpleChris Lattner2006-05-171-12/+71
| | | | | | enough to be autogenerated. llvm-svn: 28354
* Switch PPC over to a call-selection model where the lowering code createsChris Lattner2006-05-171-54/+105
| | | | | | | | the copyto/fromregs instead of making the PPCISD::CALL selection code create them. This vastly simplifies the selection code, and moves the ABI handling parts into one place. llvm-svn: 28346
* 3 changes, 2 of which are cleanup one of which changes codegen:Chris Lattner2006-05-171-105/+111
| | | | | | | | | | 1. Rearrange code a bit so that the special case doesn't require indenting lots of code. 2. Add comments describing PPC calling convention. 3. Only round up to 56-bytes of stack space for an outgoing call if the callee is varargs. This saves a bit of stack space. llvm-svn: 28342
* implement passing/returning vector regs to calls, at least non-varargs calls.Chris Lattner2006-05-161-1/+12
| | | | llvm-svn: 28341
* Instead of implementing LowerCallTo directly, let the default impl produce anChris Lattner2006-05-161-211/+147
| | | | | | | | ISD::CALL node, then custom lower that. This means that we only have to handle LEGAL call operands/results, not every possible type. This allows us to simplify the call code, shrinking it by about 1/3. llvm-svn: 28339
* Simplify the argument counting logic by only incrementing the index.Chris Lattner2006-05-161-14/+11
| | | | llvm-svn: 28335
* Simplify the dead argument handling code.Chris Lattner2006-05-161-28/+11
| | | | llvm-svn: 28334
* Vector args passed in registers don't reserve stack space.Chris Lattner2006-05-161-11/+26
| | | | llvm-svn: 28333
* Switch the PPC backend over to using FORMAL_ARGUMENTS for formal argumentChris Lattner2006-05-161-168/+157
| | | | | | | | | | handling. This makes the lower argument code significantly simpler (we only need to handle legal argument types). Incidentally, this also implements support for vector argument registers, so long as they are not on the stack. llvm-svn: 28331
* Fit in 80 colsChris Lattner2006-05-161-3/+3
| | | | llvm-svn: 28311
* Remove dead var, fix bad override.Chris Lattner2006-05-121-1/+0
| | | | llvm-svn: 28264
* Fix CodeGen/Generic/2006-04-28-Sign-extend-bool.llChris Lattner2006-04-281-0/+4
| | | | llvm-svn: 28017
* JumpTable support! What this represents is working asm and jit support forNate Begeman2006-04-221-0/+32
| | | | | | | | x86 and ppc for 100% dense switch statements when relocations are non-PIC. This support will be extended and enhanced in the coming days to support PIC, and less dense forms of jump tables. llvm-svn: 27947
* Fix a crash on:Chris Lattner2006-04-181-2/+24
| | | | | | | | | | | void foo2(vector float *A, vector float *B) { vector float C = (vector float)vec_cmpeq(*A, *B); if (!vec_any_eq(*A, *B)) *B = (vector float){0,0,0,0}; *A = C; } llvm-svn: 27808
* pretty print node nameChris Lattner2006-04-181-0/+1
| | | | llvm-svn: 27806
* Implement an important entry from README_ALTIVEC:Chris Lattner2006-04-181-15/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an altivec predicate compare is used immediately by a branch, don't use a (serializing) MFCR instruction to read the CR6 register, which requires a compare to get it back to CR's. Instead, just branch on CR6 directly. :) For example, for: void foo2(vector float *A, vector float *B) { if (!vec_any_eq(*A, *B)) *B = (vector float){0,0,0,0}; } We now generate: _foo2: mfspr r2, 256 oris r5, r2, 12288 mtspr 256, r5 lvx v2, 0, r4 lvx v3, 0, r3 vcmpeqfp. v2, v3, v2 bne cr6, LBB1_2 ; UnifiedReturnBlock LBB1_1: ; cond_true vxor v2, v2, v2 stvx v2, 0, r4 mtspr 256, r2 blr LBB1_2: ; UnifiedReturnBlock mtspr 256, r2 blr instead of: _foo2: mfspr r2, 256 oris r5, r2, 12288 mtspr 256, r5 lvx v2, 0, r4 lvx v3, 0, r3 vcmpeqfp. v2, v3, v2 mfcr r3, 2 rlwinm r3, r3, 27, 31, 31 cmpwi cr0, r3, 0 beq cr0, LBB1_2 ; UnifiedReturnBlock LBB1_1: ; cond_true vxor v2, v2, v2 stvx v2, 0, r4 mtspr 256, r2 blr LBB1_2: ; UnifiedReturnBlock mtspr 256, r2 blr This implements CodeGen/PowerPC/vec_br_cmp.ll. llvm-svn: 27804
* Use vmladduhm to do v8i16 multiplies which is faster and simpler than doingChris Lattner2006-04-181-18/+3
| | | | | | even/odd halves. Thanks to Nate telling me what's what. llvm-svn: 27793
* Implement v16i8 multiply with this code:Chris Lattner2006-04-181-2/+25
| | | | | | | | | | | | | | | | vmuloub v5, v3, v2 vmuleub v2, v3, v2 vperm v2, v2, v5, v4 This implements CodeGen/PowerPC/vec_mul.ll. With this, v16i8 multiplies are 6.79x faster than before. Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with GCC. Remove the 'integer multiplies' todo from the README file. llvm-svn: 27792
* Lower v8i16 multiply into this code:Chris Lattner2006-04-181-25/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | li r5, lo16(LCPI1_0) lis r6, ha16(LCPI1_0) lvx v4, r6, r5 vmulouh v5, v3, v2 vmuleuh v2, v3, v2 vperm v2, v2, v5, v4 where v4 is: LCPI1_0: ; <16 x ubyte> .byte 2 .byte 3 .byte 18 .byte 19 .byte 6 .byte 7 .byte 22 .byte 23 .byte 10 .byte 11 .byte 26 .byte 27 .byte 14 .byte 15 .byte 30 .byte 31 This is 5.07x faster on the G5 (measured) than lowering to scalar code + loads/stores. llvm-svn: 27789
* Custom lower v4i32 multiplies into a cute sequence, instead of having legalizeChris Lattner2006-04-181-10/+53
| | | | | | | | | scalarize the sequence into 4 mullw's and a bunch of load/store traffic. This speeds up v4i32 multiplies 4.1x (measured) on a G5. This implements PowerPC/vec_mul.ll llvm-svn: 27788
* Make sure to check splats of every constant we can, handle splat(31) byChris Lattner2006-04-171-5/+14
| | | | | | being a bit more clever, add support for odd splats from -31 to -17. llvm-svn: 27764
* Teach the ppc backend to use rol and vsldoi to generate splatted constants.Chris Lattner2006-04-171-15/+49
| | | | | | This implements vec_constants.ll:test_vsldoi and test_rol llvm-svn: 27760
* Make some code more general, adding support for constant formation of severalChris Lattner2006-04-171-22/+78
| | | | | | new patterns. llvm-svn: 27754
* Learn how to make odd splatted constants in range [17,29]. This implementsChris Lattner2006-04-171-0/+7
| | | | | | PowerPC/vec_constants.ll:test_29. llvm-svn: 27752
* Pull some code out into a helper function.Chris Lattner2006-04-171-16/+26
| | | | | | | | | | | | | Effeciently codegen even splats in the range [-32,30]. This allows us to codegen <30,30,30,30> as: vspltisw v0, 15 vadduwm v2, v0, v0 instead of as a cp load. llvm-svn: 27750
* Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle,Chris Lattner2006-04-171-2/+135
| | | | | | | if it can be implemented in 3 or fewer discrete altivec instructions, codegen it as such. This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll llvm-svn: 27748
* Implement a TODO: have the legalizer canonicalize a bunch of operations toChris Lattner2006-04-161-9/+24
| | | | | | | one type (v4i32) so that we don't have to write patterns for each type, and so that more CSE opportunities are exposed. llvm-svn: 27731
OpenPOWER on IntegriCloud