summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/memset-nonzero.ll
Commit message (Collapse)AuthorAgeFilesLines
* MCStreamer: Use "cfi" for CFI related temp labels.Matthias Braun2016-11-301-1/+1
| | | | | | | | | | | | | Choosing a "cfi" name makes the intend a bit clearer in an assembly dump and more importantly the assembly dumps are slightly more stable as the numbers don't move around anymore when unrelated code calls createTempSymbol() more or less often. As they are temp labels the name doesn't influence the generated object code. Differential Revision: https://reviews.llvm.org/D27244 llvm-svn: 288290
* [x86] regenerate checksSanjay Patel2016-04-051-35/+70
| | | | | | | | | | | utils/update_test_checks.py was improved with: http://reviews.llvm.org/rL265414 to include the first line of the function (expected to be a comment line). This ensures that nothing bad has happened before the first actual line of checked asm. It also matches the existing behavior of the old script. llvm-svn: 265416
* [x86] add an SSE2 + fast-unaligned accesses run for memset nonzero testsSanjay Patel2016-04-011-4/+122
| | | | | | | | | Was there really no other way to splat a byte in SSE2? punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7] pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1] llvm-svn: 265172
* [x86] add an SSE1 run for these testsSanjay Patel2016-04-011-105/+106
| | | | | | | | Note however that this is identical to the existing SSE2 run. What we really want is yet another run for an SSE2 machine that also has fast unaligned 16-byte accesses. llvm-svn: 265167
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-18/+10
| | | | | | | | | | | | | | | | | Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-113/+66
| | | | | | | | | | | | | | | | | | | | Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148
* [x86] add memset tests to show another potential improvementSanjay Patel2016-03-311-0/+203
| | | | llvm-svn: 265048
* [x86] use SSE/AVX ops for non-zero memsets (PR27100)Sanjay Patel2016-03-311-51/+131
| | | | | | | | | | | | | | Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029
* [x86] add tests to show current memset codegenSanjay Patel2016-03-291-0/+88
llvm-svn: 264748
OpenPOWER on IntegriCloud