summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/SLPVectorizer/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-241-1/+1
| | | | as cleanups after D56351
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-173-0/+129
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-173-129/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction ↵Simon Pilgrim2019-03-251-2/+2
| | | | | | | | | | | | | | canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356913
* [SLP]Moved NVPTX test under NVPTX directory, NFC.Alexey Bataev2019-01-111-0/+57
| | | | llvm-svn: 350969
* [SLP]Update test checks for the SPL vectorizer, NFC.Alexey Bataev2019-01-111-13/+43
| | | | llvm-svn: 350967
* [NVPTX] Turn on Loop/SLP vectorizationBenjamin Kramer2018-04-272-0/+42
Since PTX has grown a <2 x half> datatype vectorization has become more important. The late LoadStoreVectorizer intentionally only does loads and stores, but now arithmetic has to be vectorized for optimal throughput too. This is still very limited, SLP vectorization happily creates <2 x half> if it's a legal type but there's still a lot of register moving happening to get that fed into a vectorized store. Overall it's a small performance win by reducing the amount of arithmetic instructions. I haven't really checked what the loop vectorizer does to PTX code, the cost model there might need some more tweaks. I didn't see it causing harm though. Differential Revision: https://reviews.llvm.org/D46130 llvm-svn: 331035
OpenPOWER on IntegriCloud