summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/avx-trunc.ll
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an ↵Craig Topper2018-11-181-4/+2
| | | | | | | | | | | | | | | | extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-3/+3
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [X86 Codegen] Fixed a bug in unsigned saturationElena Demikhovsky2017-01-291-24/+0
| | | | | | | | | | | PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern. AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned. See https://llvm.org/bugs/show_bug.cgi?id=31773 Differential Revision: https://reviews.llvm.org/D29196 llvm-svn: 293431
* Recommiting unsigned saturation with a bugfix.Elena Demikhovsky2017-01-191-0/+26
| | | | | | | A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479
* Revert r291670 because it introduces a crash.Michael Kuperstein2017-01-181-26/+0
| | | | | | | | | r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444
* X86 CodeGen: Optimized pattern for truncate with unsigned saturation.Elena Demikhovsky2017-01-111-0/+26
| | | | | | | | | DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670
* [x86] use a single shufps when it can save instructionsSanjay Patel2016-12-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
* [x86] fix test specificationsSanjay Patel2016-12-121-4/+4
| | | | llvm-svn: 289493
* [X86][AVX] Regenerated AVX testsSimon Pilgrim2016-01-161-8/+29
| | | | | | Updated i1 select, vector truncation and subvector extraction tests llvm-svn: 257995
* [x86] Teach the 128-bit vector shuffle lowering routines to takeChandler Carruth2015-02-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a *very* fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. llvm-svn: 229350
* Lower AVX v4i64->v4i32 truncate to one shuffle.Cameron McInally2014-03-051-3/+5
| | | | llvm-svn: 202996
* X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.Benjamin Kramer2013-10-231-1/+6
| | | | | | Also update the cost model. llvm-svn: 193270
* Cleanup: test source files do not need to be executableArnaud A. de Grandmaison2013-04-221-0/+0
| | | | llvm-svn: 180003
* Revert revision: 171467. This transformation is incorrect and makes some ↵Nadav Rotem2013-01-041-15/+0
| | | | | | | | | tests fail. Original message: Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171468
* Simplified TRUNCATE operation that comes after SETCC. It is possible since ↵Elena Demikhovsky2013-01-031-0/+15
| | | | | | | | SETCC result is 0 or -1. Added a test. llvm-svn: 171467
* Unix line endingsMatt Beaumont-Gay2012-02-021-15/+15
| | | | llvm-svn: 149615
* Optimization for "truncate" operation on AVX.Elena Demikhovsky2012-02-011-0/+15
Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles. llvm-svn: 149485
OpenPOWER on IntegriCloud