summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.Ahmed Bougacha2016-04-071-24/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636
* [X86] Refresh and tweak EFLAGS reuse tests. NFC.Ahmed Bougacha2016-04-071-51/+82
| | | | | | The non-1 and EQ/NE tests were misguided. llvm-svn: 265635
* Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-04-078-17/+75
| | | | | | | | | | | | | | | | eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623
* Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."Hans Wennborg2016-04-061-5/+15
| | | | | | It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559
* Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-04-068-77/+19
| | | | | | | | | eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551
* Recommit r265309 after fixed an invalid memory reference bug happenedWei Mi2016-04-064-4/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when DenseMap growed and moved memory. I verified it fixed the bootstrap problem on x86_64-linux-gnu but I cannot verify whether it fixes the bootstrap error on clang-ppc64be-linux. I will watch the build-bot result closely. Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265547
* Lower @llvm.experimental.deoptimize as a noreturn callSanjoy Das2016-04-061-6/+0
| | | | | | | | | | | | | | | | | | | | | | | While preserving the return value for @llvm.experimental.deoptimize at the IR level is useful during mid-level optimization, doing so at the machine instruction level requires generating some extra code and a return that is non-ideal. This change has LLVM lower ``` %val = call @llvm.experimental.deoptimize ret %val ``` to effectively ``` call @__llvm_deoptimize() unreachable ``` instead. llvm-svn: 265502
* Faster stack-protector for Android/AArch64.Evgeniy Stepanov2016-04-051-0/+25
| | | | | | | Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481
* Swift Calling Convention: add swiftcc.Manman Ren2016-04-051-0/+201
| | | | | | Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480
* [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.Ahmed Bougacha2016-04-051-15/+5
| | | | | | | | | | | | | | | | | | | | We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450
* [X86] Add tests for ATOMIC_LOAD_OP EFLAGS reuse. NFC.Ahmed Bougacha2016-04-051-0/+159
| | | | llvm-svn: 265448
* [x86] regenerate checksSanjay Patel2016-04-053-43/+86
| | | | | | | | | | | utils/update_test_checks.py was improved with: http://reviews.llvm.org/rL265414 to include the first line of the function (expected to be a comment line). This ensures that nothing bad has happened before the first actual line of checked asm. It also matches the existing behavior of the old script. llvm-svn: 265416
* Don't delete empty preheaders in CodeGenPrepare if it would create a ↵Chuang-Yu Cheng2016-04-059-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397
* Don't fold double constant to an integer if dest type not integralTeresa Johnson2016-04-041-0/+12
| | | | | | | | | | | | | | Summary: I encountered this issue when constant folding during inlining tried to fold away a bitcast of a double to an x86_mmx, which is not an integral type. The test case exposes the same issue with a smaller code snippet during early CSE. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18528 llvm-svn: 265367
* Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-04-048-18/+77
| | | | | | | | | | | | | | | | | | | | eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345
* Beef up some dllexport tests.Sean Silva2016-04-041-1/+23
| | | | | | | | | | | | | | | | Adds some dllexport tests to verify that: - Variables in bss are exported appropriately - Non-dllexport symbols aliased to dllexport symbols are not exported - Symbols declared as dllexport but are not defined are not exported We plan to enable dllimport/dllexport support for the PS4, and these additional tests are for points we noticed in our internal testing. Patch by Warren Ristow! Differential Revision: http://reviews.llvm.org/D18682 llvm-svn: 265333
* Revert r265309 and r265312 because they caused some errors I need to ↵Wei Mi2016-04-044-200/+4
| | | | | | investigate. llvm-svn: 265317
* Replace analyzeSiblingValues with new algorithm to fix its compileWei Mi2016-04-044-4/+200
| | | | | | | | | | | | | | | | | | | | | | | | | time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265309
* AVX-512: Truncating store for i1 vectorsElena Demikhovsky2016-04-042-416/+170
| | | | | | | | | Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283
* [X86][SSE] Refreshed MOVMSK sign bit testsSimon Pilgrim2016-04-031-26/+48
| | | | llvm-svn: 265267
* AVX-512: Load and Extended Load for i1 vectorsElena Demikhovsky2016-04-034-906/+139
| | | | | | | | | | Implemented load+{sign|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259
* [X86][SSE] Added 1024-bit vector comparison testsSimon Pilgrim2016-04-021-0/+4894
| | | | | | More examples of PR22603, poor vector splitting for AVX512F targets as well as missing uses of PACKSS/MOVMSK llvm-svn: 265248
* [X86][AVX512] Added AVX512 comparison testsSimon Pilgrim2016-04-021-0/+98
| | | | llvm-svn: 265247
* [X86][AVX] Added vector float truncation (double2float) testsSimon Pilgrim2016-04-021-0/+168
| | | | llvm-svn: 265222
* Add missing emissionKind flags to the DICompileUnits of several old testcases.Adrian Prantl2016-04-011-1/+1
| | | | llvm-svn: 265192
* [X86][SSE] Regenerated vector float tests - fabs / floor(etc.) / fneg / ↵Simon Pilgrim2016-04-014-205/+534
| | | | | | float2double llvm-svn: 265186
* [X86][SSE] Vector i64 load testsSimon Pilgrim2016-04-011-11/+32
| | | | llvm-svn: 265185
* [X86][SSE] Regenerated comparison mask and float immediate testsSimon Pilgrim2016-04-012-19/+66
| | | | llvm-svn: 265184
* [X86][SSE] Regenerated the vec_extract tests.Simon Pilgrim2016-04-015-180/+431
| | | | llvm-svn: 265183
* [X86][SSE] Regenerated the vec_insert tests.Simon Pilgrim2016-04-019-121/+410
| | | | llvm-svn: 265179
* [X86][SSE] Regenerated vec_partial tests.Simon Pilgrim2016-04-011-10/+11
| | | | llvm-svn: 265173
* [x86] add an SSE2 + fast-unaligned accesses run for memset nonzero testsSanjay Patel2016-04-011-4/+122
| | | | | | | | | Was there really no other way to splat a byte in SSE2? punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7] pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1] llvm-svn: 265172
* [X86][SSE] Regenerated vec_logical tests.Simon Pilgrim2016-04-011-27/+72
| | | | llvm-svn: 265171
* [X86][SSE] Regenerated vector sdiv to shifts testsSimon Pilgrim2016-04-011-46/+239
| | | | | | Added SSE + AVX1 tests as well as AVX2 llvm-svn: 265169
* [x86] add an SSE1 run for these testsSanjay Patel2016-04-011-105/+106
| | | | | | | | Note however that this is identical to the existing SSE2 run. What we really want is yet another run for an SSE2 machine that also has fast unaligned 16-byte accesses. llvm-svn: 265167
* [X86][SSE] Regenerated vec_setcc tests.Simon Pilgrim2016-04-011-111/+131
| | | | llvm-svn: 265164
* [X86][SSE] Regenerated the vec_set tests.Simon Pilgrim2016-04-0113-128/+214
| | | | | | Replaced lots of dodgy greps with actual codegen llvm-svn: 265163
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-18/+10
| | | | | | | | | | | | | | | | | Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-113/+66
| | | | | | | | | | | | | | | | | | | | Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148
* [X86][AVX512] Regenerated intrinsics testsSimon Pilgrim2016-04-011-126/+146
| | | | llvm-svn: 265135
* [X86] Introduce Lakemont CPU.Andrey Turetskiy2016-04-011-0/+9
| | | | | | | | Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128
* Improve CHECK-NOT robustness of dllexport testsSean Silva2016-04-012-5/+20
| | | | | | | | | | | | | This changes some dllexport tests, to verify that some symbols that should not be exported are not, in a way that improves the robustness of CHECK-SAME interaction with CHECK-NOT. We plan to enable dllimport/dllexport support for the PS4, and these changes are for points we noticed in our internal testing. Patch by Warren Ristow! llvm-svn: 265106
* Don't use an i64 return type with webkit_jsccSanjoy Das2016-04-011-4/+4
| | | | | | | | | Re-enable an assertion enabled by Justin Lebar in rL265092. rL265092 was breaking test/CodeGen/X86/deopt-intrinsic.ll because webkit_jscc does not like non-i64 return types. Change the test case to not do that. llvm-svn: 265099
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-0131-32/+32
| | | | llvm-svn: 265081
* Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.Adrian Prantl2016-03-3115-16/+16
| | | | | | | | | | | | | This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077
* [x86] add memset tests to show another potential improvementSanjay Patel2016-03-311-0/+203
| | | | llvm-svn: 265048
* Revert r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-03-318-46/+18
| | | | | | | | | | eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046
* [X86][SSE] Some basic tests for variable shufflesSimon Pilgrim2016-03-312-0/+1942
| | | | | | We don't really support non-constant shuffle masks, but these tests are for cases where BUILD_VECTOR is made up from vector extracts (as well as undef/zero scalars). llvm-svn: 265045
* [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr ↵Hans Wennborg2016-03-318-18/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039
* [x86] use SSE/AVX ops for non-zero memsets (PR27100)Sanjay Patel2016-03-311-51/+131
| | | | | | | | | | | | | | Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029
OpenPOWER on IntegriCloud