summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* Add an artificial line-0 debug location when the compiler emits a call toYunzhong Gao2016-06-301-0/+24
| | | | | | | | __stack_chk_fail(). This avoids a compiler crash. Differential Revision: http://reviews.llvm.org/D21818 llvm-svn: 274263
* [X86] Lower blended PACKUSes using appropriate types.Ahmed Bougacha2016-06-291-0/+59
| | | | | | | | | | | | | When lowering two blended PACKUS, we used to disregard the types of the PACKUS inputs, indiscriminately generating a v16i8 PACKUS. This leads to non-selectable things like: (v16i8 (PACKUS (v4i32 v0), (v4i32 v1))) Instead, check that the PACKUSes have the same type, and use that as the final result type. llvm-svn: 274138
* Update tests to use at least darwin9.Rafael Espindola2016-06-299-74/+58
| | | | llvm-svn: 274129
* [X86][SSE2] Added _mm_loadu_si64 test to match ↵Simon Pilgrim2016-06-291-0/+11
| | | | | | llvm\tools\clang\test\CodeGen\sse2-builtins.c llvm-svn: 274127
* [X86] Regenerated popcnt combine testsSimon Pilgrim2016-06-291-14/+24
| | | | llvm-svn: 274124
* [DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero ↵Craig Topper2016-06-291-12/+3
| | | | | | vectors where the zero vector is the first operand to the shuffle instead of the second. llvm-svn: 274097
* [DAGCombine] Add test cases to show that DAG combining an OR of two shuffles ↵Craig Topper2016-06-291-0/+44
| | | | | | with zero vectors doesn't work if the zero vector is the first operand of the shuffle. Fix coming in a follow up patch. llvm-svn: 274096
* Relax the clearance calculating for breaking partial register dependency.Dehao Chen2016-06-281-0/+11
| | | | | | | | | | | | Summary: LLVM assumes that large clearance will hide the partial register spill penalty. But in our experiment, 16 clearance is too small. As the inserted XOR is normally fairly cheap, we should have a higher clearance threshold to aggressively insert XORs that is necessary to break partial register dependency. Reviewers: wmi, davidxl, stoklund, zansari, myatsina, RKSimon, DavidKreitzer, mkuper, joerg, spatel Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D21560 llvm-svn: 274068
* X86FrameLowering: Check subregs when deciding prolog kill flagsMatthias Braun2016-06-281-0/+21
| | | | llvm-svn: 274057
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-06-282-111/+112
| | | | | | | | | | | | | | This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043
* [X86] Update a test with more explicit checks. NFC.Michael Kuperstein2016-06-281-54/+123
| | | | llvm-svn: 274040
* [X86] Make WRPKRU/RDPKRU pass -verify-machineinstrsDavid Majnemer2016-06-281-1/+1
| | | | | | | | | | | The original implementation attempted to zero registers using XOR %foo, %foo. This is problematic because it constitutes a read-modify-write of a register which might not be defined. Instead, use MOV32r0 to avoid these problems; expandPostRAPseudo does the right thing here. llvm-svn: 274024
* [X86][AVX] Peek through bitcasts to find the source of broadcasts (reapplied)Simon Pilgrim2016-06-284-19/+27
| | | | | | | | | | | | | | AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. As we're being more aggressive with bitcasts, we also need to ensure that the broadcast type is correctly bitcasted Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 274013
* [X86][SSE] Added support for combining target shuffles to ↵Simon Pilgrim2016-06-2821-112/+117
| | | | | | | | | | | | (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments. Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication. Differential Revision: http://reviews.llvm.org/D21148 llvm-svn: 273999
* [X86 Target Lowering] Merged ICMP test.Elena Demikhovsky2016-06-282-31/+27
| | | | llvm-svn: 273995
* Teach shouldAssumeDSOLocal about tls.Rafael Espindola2016-06-271-0/+15
| | | | | | Fixes a fixme about handling other visibilities. llvm-svn: 273921
* X86 Lowering - Fixed a crash in ICMP scalar instructionElena Demikhovsky2016-06-271-0/+31
| | | | | | | | Fixed a bug in EmitTest() function in combining shl + icmp. https://llvm.org/bugs/show_bug.cgi?id=28119 llvm-svn: 273899
* Revert -r273892 "Support arbitrary addrspace pointers in masked load/store ↵Artur Pilipenko2016-06-272-112/+111
| | | | | | intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-06-272-111/+112
| | | | | | | | | | | | | | This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892
* [X86][SSE] Added extra broadcast tests to cover PR28327Simon Pilgrim2016-06-272-0/+53
| | | | llvm-svn: 273891
* Revert 273848, it caused PR28329Nico Weber2016-06-272-5/+12
| | | | llvm-svn: 273879
* Removed duplicate assertions noteSimon Pilgrim2016-06-271-1/+0
| | | | llvm-svn: 273874
* [X86][AVX] Peek through bitcasts to find the source of broadcastsSimon Pilgrim2016-06-272-12/+5
| | | | | | | | | | | | AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 273848
* Use shouldAssumeDSOLocal in isOffsetFoldingLegal.Rafael Espindola2016-06-241-0/+10
| | | | | | This makes it slightly more powerful for dynamic-no-pic. llvm-svn: 273704
* Codegen: Fix broken assumption in Tail Merge.Kyle Butt2016-06-242-0/+36
| | | | | | | | | Tail merge was making the assumption that a layout successor or predecessor was always a cfg successor/predecessor. Remove that assumption. Changes to tests are necessary because the errant cfg edges were preventing optimizations. llvm-svn: 273700
* Use FileCheck. NFC.Rafael Espindola2016-06-241-10/+3
| | | | llvm-svn: 273699
* Codegen: [X86] preservere memory refs for folded umul_lohiKyle Butt2016-06-231-1/+32
| | | | | | | | | Memory references were not being propagated for this folded load. This prevented optimizations like LICM from hoisting the load. Added test to verify that this allows LICM to proceed. llvm-svn: 273617
* Codegen: LICM Remove check for exactly 1 register def.Kyle Butt2016-06-232-22/+43
| | | | | | | | | | | | | | | | | | When considering whether to split an instruction with a memory operand into an explicit load and a register-based instruction, we currently check that the resulting instruction has exactly 1 def. This prevents 2 important LICM optimizations: compares with memory operands, and double indirect calls. All the tests and the test-suite pass without the check. My guess as to original intent is to limit the additional register pressure created by the new instruction, but given that we only split out a single register, it is already limited. The licm-dominance test now checks actual memory loads for hoisting instead of undef, and it tests compares. hoist-invariant-load.ll now checks for 2 hoists, the intended hoist, and a bonus from calling a got-relative function in a loop. llvm-svn: 273616
* [X86] Extract HiPE prologue constants into metadataMichael Kuperstein2016-06-233-3/+16
| | | | | | | | | | | | | | | | | | | | | | X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset into an Erlang Runtime System-internal data structure (the PCB). As the layout of this data structure is prone to change, this poses problems for maintaining compatibility. To address this problem, the compiler can produce this information as module-level named metadata. For example (where P_NSP_LIMIT is the offending offset): !hipe.literals = !{ !2, !3, !4 } !2 = !{ !"P_NSP_LIMIT", i32 152 } !3 = !{ !"X86_LEAF_WORDS", i32 24 } !4 = !{ !"AMD64_LEAF_WORDS", i32 24 } Patch by Magnus Lang Differential Revision: http://reviews.llvm.org/D20363 llvm-svn: 273593
* [X86][AVX512] Added AVX512F vector sign extend testsSimon Pilgrim2016-06-231-130/+415
| | | | | | Now that Elena has confirmed that PR26474 has been fixed llvm-svn: 273560
* [AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle ↵Craig Topper2016-06-239-672/+667
| | | | | | and selects. llvm-svn: 273543
* [ImplicitNullChecks] Hoist trivial depdendencies if possibleSanjoy Das2016-06-221-0/+266
| | | | | | | | | | | | | | | | | | | | | | When trying to convert a loading instruction into a FAULTING_LOAD, we sometimes face code like this: if %R10 is not null: %R9<def> = MOV32ri Immediate %R9<def, tied> = AND32rm %R9, 0x20(%R10) else: goto TRAP In these cases we would like to use the AND32rm instruction as the faulting operation by hoisting the "depedency" def-ing %R9 also above the control flow, transforming the program into: %R9<def> = MOV32ri Immediate %R9<def, tied> = FAULTING_LOAD_OP(AND32rm %R9, 0x20(%R10), FailPath: TRAP) This change teaches ImplicitNullChecks to do the above, when safe. llvm-svn: 273501
* Upgrade old memset/memcpy signatures (without isVolatile argument) in testsArtur Pilipenko2016-06-223-4/+2
| | | | | | We no longer have corresponding code in autoupgrade and the vast majority of the tests were fixed long time ago. Fix the remaining few. One of the verifier test cases is marked as XFAIL because it was passing only because the signature was incorrect. llvm-svn: 273428
* Regenerated testSimon Pilgrim2016-06-221-1/+1
| | | | llvm-svn: 273404
* [StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4Etienne Bergeron2016-06-211-1/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fix the computation of the offsets present in the scopetable when using the SEH (__except_handler4). This patch added an intrinsic to track the position of the allocation on the stack of the EHGuard. This position is needed when producing the ScopeTable. ``` struct _EH4_SCOPETABLE { DWORD GSCookieOffset; DWORD GSCookieXOROffset; DWORD EHCookieOffset; DWORD EHCookieXOROffset; _EH4_SCOPETABLE_RECORD ScopeRecord[1]; }; struct _EH4_SCOPETABLE_RECORD { DWORD EnclosingLevel; long (*FilterFunc)(); union { void (*HandlerAddress)(); void (*FinallyFunc)(); }; }; ``` The code to generate the EHCookie is added in `X86WinEHState.cpp`. Which is adding these instructions when using SEH4. ``` Lfunc_begin0: # BB#0: # %entry pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp movl %ebp, %eax <<-- Loading FramePtr movl %esp, -36(%ebp) movl $-2, -16(%ebp) movl $L__ehtable$use_except_handler4_ssp, %ecx xorl ___security_cookie, %ecx movl %ecx, -20(%ebp) xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie movl %eax, -40(%ebp) <<-- Storing EHGuard leal -28(%ebp), %eax movl $__except_handler4, -24(%ebp) movl %fs:0, %ecx movl %ecx, -28(%ebp) movl %eax, %fs:0 movl $0, -16(%ebp) calll _may_throw_or_crash LBB1_1: # %cont movl -28(%ebp), %eax movl %eax, %fs:0 addl $28, %esp popl %esi popl %edi popl %ebx popl %ebp retl ``` And the corresponding offset is computed: ``` Luse_except_handler4_ssp$parent_frame_offset = -36 .p2align 2 L__ehtable$use_except_handler4_ssp: .long -2 # GSCookieOffset .long 0 # GSCookieXOROffset .long -40 # EHCookieOffset <<---- .long 0 # EHCookieXOROffset .long -2 # ToState .long _catchall_filt # FilterFunction .long LBB1_2 # ExceptionHandler ``` Clang is not yet producing function using SEH4, but it's a work in progress. This patch is a step toward having a valid implementation of SEH4. Unfortunately, it is not yet fully working. The EH registration block is not allocated at the right offset on the stack. Reviewers: rnk, majnemer Subscribers: llvm-commits, chrisha Differential Revision: http://reviews.llvm.org/D21231 llvm-svn: 273281
* [arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos.Daniel Sanders2016-06-211-0/+30
| | | | | | | | | | | | | | | | Summary: canCombineSinCosLibcall() would previously combine sin+cos into sincos for GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not. However, GNU would only combine them for UnsafeFPMath because sincos does not set errno like sin and cos do. It seems likely that this is an oversight. Reviewers: t.p.northover Subscribers: t.p.northover, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D21431 llvm-svn: 273259
* [AVX512] Add patterns for any-extending a mask that use the def of ↵Craig Topper2016-06-214-902/+710
| | | | | | KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 273253
* [AVX512] Use update_llc_test_checks.py to regenerate a test in preparation ↵Craig Topper2016-06-211-111/+160
| | | | | | for a future commit. llvm-svn: 273252
* Revert "Change RelaxELFRelocations for llc."James Y Knight2016-06-211-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r273019. From email I sent to list: > I don't think this makes sense. Either the linker you're using supports > this feature, or it doesn't. Having it enabled for llc if your linker > doesn't support it is not fun. > > Further note that this also affects basically all other code using llvm > libraries -- other than Clang, which explicitly sets it back to false by > default, unless you set the ENABLE_X86_RELAX_RELOCATIONS cmake flag to > true. > > If you want to enable the relax mode across all llvm tools in some > circumstances, I think it should be via moving the cmake flag from clang > down into llvm. > > I'm going to revert this commit, since I both think it intrinsically > doesn't make sense to do this, and because it's breaking some of our > tools. llvm-svn: 273245
* [AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to ↵Craig Topper2016-06-218-552/+551
| | | | | | native icmps. llvm-svn: 273240
* [X86][X87] Fix issue with sitofp i64 -> fp128 on 32-bit targetsSimon Pilgrim2016-06-201-92/+174
| | | | | | | | Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128. Added 32-bit tests for fp128 cast/conversions. llvm-svn: 273210
* [X86][F16C] Added half <-> double conversion testsSimon Pilgrim2016-06-201-0/+2113
| | | | llvm-svn: 273153
* [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic ↵Igor Breger2016-06-203-1/+76
| | | | | | | | intrinsic lowering. Differential Revision: http://reviews.llvm.org/D20897 llvm-svn: 273138
* [X86][AVX512] Added 512-bit BITREVERSE tests and enabled AVX512BW lowering ↵Simon Pilgrim2016-06-191-0/+1681
| | | | | | support llvm-svn: 273125
* [X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel valuesSimon Pilgrim2016-06-195-18/+18
| | | | | | | | | | | | We currently only allow exact matches of shuffle mask patterns during target shuffle combining. This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value. I've adjusted some tests that were requiring exact shuffle masks to now include undef values. Differential Revision: http://reviews.llvm.org/D21495 llvm-svn: 273119
* [X86][AVX] Added test case for PR28136Simon Pilgrim2016-06-181-0/+38
| | | | llvm-svn: 273098
* [X86][SSSE3] Added examples of target shuffle combining failing to match ↵Simon Pilgrim2016-06-181-0/+28
| | | | | | undefs in shuffle masks llvm-svn: 273097
* [X86][XOP] Added fast-isel tests matching ↵Simon Pilgrim2016-06-181-0/+1111
| | | | | | tools/clang/test/CodeGen/xop-builtins.c llvm-svn: 273096
* [X86][TBM] Added fast-isel tests matching ↵Simon Pilgrim2016-06-182-0/+345
| | | | | | tools/clang/test/CodeGen/tbm-builtins.c llvm-svn: 273087
* [X86][SSE4A] Autoupgrade and remove MOVNTSD/MOVNTSS intrinsicsSimon Pilgrim2016-06-182-34/+39
| | | | | | Required better annotation of the instruction defs upon removal of the builtin intrinsic pattern. llvm-svn: 273077
OpenPOWER on IntegriCloud