diff options
Diffstat (limited to 'llvm/lib/Target/PowerPC')
| -rw-r--r-- | llvm/lib/Target/PowerPC/README_ALTIVEC.txt | 112 | 
1 files changed, 103 insertions, 9 deletions
diff --git a/llvm/lib/Target/PowerPC/README_ALTIVEC.txt b/llvm/lib/Target/PowerPC/README_ALTIVEC.txt index 5144590142a..56fd2cb29f2 100644 --- a/llvm/lib/Target/PowerPC/README_ALTIVEC.txt +++ b/llvm/lib/Target/PowerPC/README_ALTIVEC.txt @@ -1,11 +1,5 @@  //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===// -Implement TargetConstantVec, and set up PPC to custom lower ConstantVec into -TargetConstantVec's if it's one of the many forms that are algorithmically -computable using the spiffy altivec instructions. - -//===----------------------------------------------------------------------===// -  Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector  registers, to generate better spill code. @@ -31,8 +25,6 @@ void foo(void) {  Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763 -We need to codegen -0.0 vector efficiently (no constant pool load). -  When -ffast-math is on, we can use 0.0.  //===----------------------------------------------------------------------===// @@ -48,7 +40,109 @@ a load/store/lve*x sequence.  //===----------------------------------------------------------------------===//  There are a wide range of vector constants we can generate with combinations of -altivec instructions.  For example, GCC does: t=vsplti*, r = t+t. +altivec instructions.  Examples + GCC does: "t=vsplti*, r = t+t"  for constants it can't generate with one vsplti + + -0.0 (sign bit):  vspltisw v0,-1 / vslw v0,v0,v0 + +//===----------------------------------------------------------------------===// + +Missing intrinsics: + +ds* +lve* +lvs* +lvx* +mf* +st* +vavg* +vexptefp +vlogefp +vmax* +vmhaddshs/vmhraddshs +vmin* +vmladduhm +vmr* +vmsum* +vmul* +vperm +vpk* +vr* +vsel (some aliases only accessible using builtins) +vsl* (except vsldoi) +vsr* +vsum* +vup* + +//===----------------------------------------------------------------------===// + +FABS/FNEG can be codegen'd with the appropriate and/xor of -0.0.  //===----------------------------------------------------------------------===// +For functions that use altivec AND have calls, we are VRSAVE'ing all call +clobbered regs. + +//===----------------------------------------------------------------------===// + +VSPLTW and friends are expanded by the FE into insert/extract element ops.  Make +sure that the dag combiner puts them back together in the appropriate  +vector_shuffle node and that this gets pattern matched appropriately. + +//===----------------------------------------------------------------------===// + +Implement passing/returning vectors by value. + +//===----------------------------------------------------------------------===// + +GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load +of C1/C2/C3, then a load and vperm of Variable. + +//===----------------------------------------------------------------------===// + +We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte +aligned stack slot, followed by a lve*x/vperm.  We should probably just store it +to a scalar stack slot, then use lvsl/vperm to load it.  If the value is already +in memory, this is a huge win. + +//===----------------------------------------------------------------------===// + +Do not generate the MFCR/RLWINM sequence for predicate compares when the +predicate compare is used immediately by a branch.  Just branch on the right +cond code on CR6. + +//===----------------------------------------------------------------------===// + +SROA should turn "vector unions" into the appropriate insert/extract element +instructions. +  +//===----------------------------------------------------------------------===// + +We need an LLVM 'shuffle' instruction, that corresponds to the VECTOR_SHUFFLE +node. + +//===----------------------------------------------------------------------===// + +We need a way to teach tblgen that some operands of an intrinsic are required to +be constants.  The verifier should enforce this constraint. + +//===----------------------------------------------------------------------===// + +We should instcombine the lvx/stvx intrinsics into loads/stores if we know that +the loaded address is 16-byte aligned. + +//===----------------------------------------------------------------------===// + +Instead of writting a pattern for type-agnostic operations (e.g. gen-zero, load, +store, and, ...) in every supported type, make legalize do the work.  We should +have a canonical type that we want operations changed to (e.g. v4i32 for +build_vector) and legalize should change non-identical types to thse.  This is +similar to what it does for operations that are only supported in some types, +e.g. x86 cmov (not supported on bytes). + +This would fix two problems: +1. Writing patterns multiple times. +2. Identical operations in different types are not getting CSE'd (e.g.  +   { 0U, 0U, 0U, 0U } and {0.0, 0.0, 0.0, 0.0}. + +  | 

