summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombiner][NFC] Tests for X div/rem Y single bit foldDavid Bolvansky2018-09-294-0/+675
| | | | llvm-svn: 343392
* [X86][AVX2] Cleanup shuffle combining tests - add common prefixesSimon Pilgrim2018-09-291-567/+278
| | | | llvm-svn: 343391
* [X86] SimplifyDemandedVectorEltsForTargetNode - remove identity target ↵Simon Pilgrim2018-09-292-5/+3
| | | | | | | | shuffles before simplifying inputs By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....). llvm-svn: 343390
* [X86] Add fast-isel test cases for unaligned load/store intrinsics recently ↵Craig Topper2018-09-291-0/+277
| | | | | | | | | | | | | | added to clang This adds tests for: _mm_loadu_si16 _mm_loadu_si32 _mm_loadu_si16 _mm_storeu_si64 _mm_storeu_si32 _mm_storeu_si16 llvm-svn: 343389
* [X86] getTargetConstantBitsFromNode - add support for rearranging constant ↵Simon Pilgrim2018-09-295-86/+62
| | | | | | | | bits via shuffles Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet. llvm-svn: 343384
* [X86] Regenerate fma comments.Simon Pilgrim2018-09-292-233/+233
| | | | llvm-svn: 343376
* [X86] getTargetConstantBitsFromNode - add support for peeking through ↵Simon Pilgrim2018-09-292-10/+9
| | | | | | ISD::EXTRACT_SUBVECTOR llvm-svn: 343375
* [X86][SSE] Fixed issue with v2i64 variable shifts on 32-bit targetsSimon Pilgrim2018-09-292-18/+36
| | | | | | The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats. llvm-svn: 343373
* [X86] Add test cases for failures to use narrow test with immediate ↵Craig Topper2018-09-281-0/+236
| | | | | | | | instructions when a truncate is beteen the CMP and the AND and the sign flag is used. The code in X86ISelDAGToDAG only looks through truncates if the sign flag isn't used, but that is overly restrictive. A future patch will improve this. llvm-svn: 343355
* [GISel]: Remove an incorrect assert in CallLoweringAditya Nandakumar2018-09-281-0/+13
| | | | | | | | | | | https://reviews.llvm.org/D51147 Asserting if any extend of vectors should be up to the target's legalizer/target specific code not in CallLowering. reviewed by : dsanders. llvm-svn: 343325
* Split invocations in CodeGen/X86/cpus.ll among multiple tests. (NFC)Jonas Devlieghere2018-09-285-137/+142
| | | | | | | | | | | | On GreenDragon `CodeGen/X86/cpus.ll` is timing out on the bot with Asan and UBSan enabled. With the same configuration on my machine, the test passes but takes more than 3 minutes to do so. I could increase the timeout, but I believe it makes more sense to split up the test because it allows for more parallelism. Differential revision: https://reviews.llvm.org/D52603 llvm-svn: 343313
* [X86][Btver2] Fix BSF/BSR scheduleSimon Pilgrim2018-09-281-12/+12
| | | | | | | | Double throughput to account for 2 pipes + fix BSF's latency/uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343311
* [X86][BtVer2] Fix PHMINPOS schedule resources typoSimon Pilgrim2018-09-281-2/+2
| | | | | | PHMINPOS can run on either JFPU pipe llvm-svn: 343299
* [X86] Add the test case from PR38986.Craig Topper2018-09-271-0/+29
| | | | | | The assembly for this test should be optimal now after changes to the ScalarizeMaskedMemIntrin patch. llvm-svn: 343281
* [ScalarizeMaskedMemIntrin] When expanding masked gathers, start with the ↵Craig Topper2018-09-272-261/+217
| | | | | | | | passthru vector and insert the new load results into it. Previously we started with undef and did a final merge with the passthru at the end. llvm-svn: 343273
* [ScalarizeMaskedMemIntrin] When expanding masked loads, start with the ↵Craig Topper2018-09-271-42/+12
| | | | | | | | passthru value and insert each conditional load result over their element. Previously we started with undef and did one final merge at the end with a select. llvm-svn: 343271
* [ScalarizeMaskedMemIntrin] Don't emit 'icmp eq i1 %x, 1' to check mask ↵Craig Topper2018-09-273-54/+39
| | | | | | | | values. That's just %x so use that directly. Had we emitted this IR earlier, InstCombine would have removed icmp so I'm going to assume using the i1 directly would be considered canonical. llvm-svn: 343244
* [X86] Update tzcnt fast-isel tests to match clang r343126.Craig Topper2018-09-262-74/+22
| | | | | | We now generate cttz with the zero_undef flag set to false. This allows -O0 to avoid the zero check. llvm-svn: 343127
* [X86][SSE] Refresh PR34947 test code to handle D52504Simon Pilgrim2018-09-261-181/+327
| | | | | | The previously reduced version used urem <9 x i32> zeroinitializer, %tmp which D52504 will simplify. llvm-svn: 343097
* [X86][SSE] Use ISD::MULHS for constant vXi16 ISD::SRA lowering (PR38151)Simon Pilgrim2018-09-263-310/+172
| | | | | | | | | | Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS. As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero. Differential Revision: https://reviews.llvm.org/D52171 llvm-svn: 343093
* Fixes removal of dead elements from PressureDiff (PR37252).Yury Gribov2018-09-264-30/+30
| | | | | | | | Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 343090
* [X86] Allow movmskpd/ps ISD nodes to be created and selected with integer ↵Craig Topper2018-09-252-58/+12
| | | | | | | | | | input types. This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there. But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way. llvm-svn: 343046
* [X86] Add some more movmsk test cases. NFCCraig Topper2018-09-251-0/+232
| | | | | | | | These IR patterns represent the exact behavior of a movmsk instruction using (zext (bitcast (icmp slt X, 0))). For the v4i32/v8i32/v2i64/v4i64 we currently emit a PCMPGT for the icmp slt which is unnecessary since we only care about the sign bit of the result. This is because of the int->fp bitcast we put on the input to the movmsk nodes for these cases. I'll be fixing this in a future patch. llvm-svn: 343045
* [x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)Sanjay Patel2018-09-251-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008
* [X86] Add AVX512 support to combineVectorSizedSetCCEquality.Craig Topper2018-09-251-283/+149
| | | | | | | | | | | | Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52424 llvm-svn: 342989
* Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-251-8/+8
| | | | | | | | As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969
* [X86] Don't create FILD ISD nodes when X87 is disabled.Craig Topper2018-09-251-0/+26
| | | | | | | | The included test case previously asserted because the type legalizer tried to soften the FILD ISD node. Fixes PR38819. llvm-svn: 342934
* Re-submitting changes in D51550 because it failed to patch.Christy Lee2018-09-241-0/+2
| | | | | | | | | | | | Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919
* [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-241-8/+8
| | | | | | The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916
* [X86][AVX] Add truncation as shuffle test for PR31451Simon Pilgrim2018-09-241-0/+17
| | | | llvm-svn: 342908
* [X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)Simon Pilgrim2018-09-241-4/+4
| | | | | | | | Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892
* [DAGCombiner] use UADDO to optimize saturated unsigned addSanjay Patel2018-09-241-16/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886
* [NFC][CodeGen][X86][AArch64] More tests for 'bit field extract' w/ constantsRoman Lebedev2018-09-241-0/+116
| | | | | | | | | | It would be best to introduce ISD::BitFieldExtract, because clearly more than one backend faces the same problem. But for now let's solve this in the x86-specific DAG combine. https://bugs.llvm.org/show_bug.cgi?id=38938 llvm-svn: 342880
* [X86] Add 512-bit test cases to setcc-wide-types.ll. NFCCraig Topper2018-09-241-0/+588
| | | | llvm-svn: 342860
* [DAGCombiner][x86] extend decompose of integer multiply into shift/add with ↵Sanjay Patel2018-09-231-124/+56
| | | | | | | | | | | | | | | | | negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844
* [X86] Added missing RCL/RCR schedule overrides to the generic SNB modelSimon Pilgrim2018-09-231-96/+96
| | | | | | | | | | | | The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults. I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well. This is necessary to allow WriteRotate to be updated to remove other rotate overrides. It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions. llvm-svn: 342842
* [x86] add tests for mul decomposition with negative constant; NFCSanjay Patel2018-09-231-0/+292
| | | | llvm-svn: 342838
* [X86] Add isel pattern for (v8i16 (sext (v8i1))) with DQI and no BWI.Craig Topper2018-09-231-118/+531
| | | | | | | | Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate. The pattern needs to extend to an v8i32 then truncate back down to v8i16. llvm-svn: 342830
* [X86] Fix inline expansion for memset in x32Craig Topper2018-09-221-0/+18
| | | | | | | | | | | | | | Summary: Similar to D51893 which was for memcpy Reviewers: efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52063 llvm-svn: 342796
* [X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) for ↵Craig Topper2018-09-221-120/+52
| | | | | | | | vXi8 vectors. We don't have a vXi8 shift left so we need to bitcast to a vXi16 vector to perform the shift. If we let lowering legalize the vXi8 shift we get an extra and that we don't need and fail to remove. llvm-svn: 342795
* [X86] Teach fast isel to use MOV32ri64 for loading an unsigned 32 immediate ↵Craig Topper2018-09-211-6/+6
| | | | | | | | into a 64-bit register. Previously we used SUBREG_TO_REG+MOV32ri. But regular isel was changed recently to use the MOV32ri64 pseudo. Fast isel now does the same. llvm-svn: 342788
* [x86] add more tests for poetntial andnp splitting with AVX1; NFCSanjay Patel2018-09-211-3/+54
| | | | llvm-svn: 342775
* [X86] Add AVX512 target to load scalar to vector testsSimon Pilgrim2018-09-211-0/+1
| | | | | | To investigate broadcast instruction codegen for D51553 llvm-svn: 342773
* [x86] add (negative) andnp test for D52318; NFCSanjay Patel2018-09-211-0/+22
| | | | llvm-svn: 342756
* [x86] add test with optsize attribute for scalar->vector transform; NFCSanjay Patel2018-09-211-0/+20
| | | | llvm-svn: 342755
* [X86][Sched] Add zero idiom sched data to the SNB model.Clement Courbet2018-09-213-37/+37
| | | | | | | | | | | | | | | | | | | | Summary: On SNB, renamer-based zeroing does not work for: - 16 and 8-bit GPRs[1]. - MMX [2]. - ANDN variants [3] [1] echo 'sub %ax, %ax' | /tmp/llvm-exegesis -mode=uops -snippets-file=- [2] echo 'pxor %mm0, %mm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=- [3] echo 'andnps %xmm0, %xmm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=- Reviewers: RKSimon, andreadb Subscribers: gbedwell, craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52358 llvm-svn: 342736
* [X86][BtVer2] Fix latency and resource cycles of AVX 256-bit zero-idioms.Andrea Di Biagio2018-09-211-4/+4
| | | | | | | | | | | | | | | | This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S|D)Yrr and VANDNP(S|D)Yrr. This is a follow-up of r342555. On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at register-renaming stage. The other opcode has to be executed to set the upper half of the destination YMM. Same for VANDNP(S|D)Yrr. Differential Revision: https://reviews.llvm.org/D52347 llvm-svn: 342728
* [X86] Add scheduling tests for AVX1 256-bit zero-idioms. NFCAndrea Di Biagio2018-09-211-0/+84
| | | | llvm-svn: 342726
* [RegAllocGreedy] Fix crash in tryLocalSplitWalter Lee2018-09-201-0/+265
| | | | | | | | | | tryLocalSplit only handles a single use block, but an interval may have multiple use blocks. So don't crash in that case. This fixes PR38795. Differential revision: https://reviews.llvm.org/D52277 llvm-svn: 342682
* Fix line-endings. NFCI.Simon Pilgrim2018-09-2019-1082/+1082
| | | | llvm-svn: 342639
OpenPOWER on IntegriCloud