summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/InstCombine/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86][InstCombine] Add constant folding and simplification support for pdep ↵Craig Topper2019-12-311-0/+132
| | | | | | | | | | | | and pext The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result. There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted. Fixes PR44389 Differential Revision: https://reviews.llvm.org/D71952
* [X86][InstCombine] Move instcombine test from test/CodeGen/X86 to ↵Craig Topper2019-12-011-0/+20
| | | | test/Transforms/InstCombine/ and replace grep with FileCheck
* [InstCombine] remove identity shuffle simplification for mask with undefsSanjay Patel2019-11-247-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that pattern. That preserves some of the old optimizations in IR. Given a shuffle that includes undef elements in an otherwise identity mask like: define <4 x float> @shuffle(<4 x float> %arg) { %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> ret <4 x float> %shuf } We were simplifying that to the input operand. But as discussed in PR43958: https://bugs.llvm.org/show_bug.cgi?id=43958 ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). Differential Revision: https://reviews.llvm.org/D70246
* [InstCombine] regenerate test CHECKs; NFCSanjay Patel2019-11-148-498/+498
| | | | | | | | | There's a discussion about changing a shufflevector transform in: https://bugs.llvm.org/show_bug.cgi?id=43958 It would protect against our current undef/poison behavior, and these are all tests that could be affected.
* [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics ↵Craig Topper2019-05-223-292/+0
| | | | | | | | | | | | | | | | | | | | | | | into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425
* [NFC][InstCombine] Add unary FNeg tests to X86/x86-avx512.llCameron McInally2019-05-211-0/+82
| | | | llvm-svn: 361308
* [InstCombine][X86] Add PACKSS tests for truncation of sign-extended comparisonsSimon Pilgrim2019-04-291-0/+106
| | | | llvm-svn: 359435
* [InstCombine][X86] Add PACKSS/PACKUS tests for truncation where saturation ↵Simon Pilgrim2019-04-251-0/+160
| | | | | | won't occur llvm-svn: 359185
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-1728-0/+12970
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-1728-12970/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [InstCombine][X86] Expand MOVMSK to generic IR (PR39927)Simon Pilgrim2019-04-081-23/+42
| | | | | | | | | | | | | | | First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern: e.g. PMOVMSKB(v16i8 x): %cmp = icmp slt <16 x i8> %x, zeroinitializer %int = bitcast <16 x i8> %cmp to i16 %res = zext i16 %int to i32 Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW Differential Revision: https://reviews.llvm.org/D60256 llvm-svn: 357909
* [InstCombine] canonicalize select shuffles by commutingSanjay Patel2019-03-313-6/+6
| | | | | | | | | | | | | | | | | | | | In PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...we have a case where we want to fold a binop of select-shuffle (blended) values. Rather than try to match commuted variants of the pattern, we can canonicalize the shuffles and check for mask equality with commuted operands. We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a special case that the backend is required to handle because we already canonicalize vector select to this shuffle form. So there should be no codegen difference from this change. It's possible that this improves CSE in IR though. Differential Revision: https://reviews.llvm.org/D60016 llvm-svn: 357366
* [InstCombine] regenerate test checks; NFCSanjay Patel2019-03-292-50/+36
| | | | llvm-svn: 357288
* Demanded elements support for masked.load and masked.gatherPhilip Reames2019-03-191-8/+8
| | | | | | | | Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140. Differential Revision: https://reviews.llvm.org/D57372 llvm-svn: 356510
* [X86] Add ImmArg markings to intrinsics.Craig Topper2019-03-121-12/+0
| | | | | | | | | | Remove test cases that checked for not crashing when immediate operands were passed not an immediate. These are now considered ill-formed in IR. This was done by manually scanning the intrinsic file for llvm_i32_ty and llvm_i8_ty which are the predominant types we use for immediates. Most of them are on vector intrinsics. I might have missed some other intrinsics. Differential Revision: https://reviews.llvm.org/D58302 llvm-svn: 355993
* [Test] Update file w/update_test_checks.py to make a follow on change obviousPhilip Reames2019-02-011-29/+29
| | | | llvm-svn: 352932
* [InstCombine] try to reduce x86 addcarry to generic uaddo intrinsicSanjay Patel2019-02-011-10/+12
| | | | | | | | | | | | | | | | | | | If we can reduce the x86-specific intrinsic to the generic op, it allows existing simplifications and value tracking folds. AFAICT, this always results in identical x86 codegen in the non-reduced case...which should be true because we semi-generically (too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must already combine/lower this intrinsic as expected. This isn't quite what was requested in: https://bugs.llvm.org/show_bug.cgi?id=40486 ...but we want to have these kinds of folds early for efficiency and to enable greater simplifications. For the case in the bug report where we have: _addcarry_u64(0, ahi, 0, &ahi) ...this gets completely simplified away in IR. Differential Revision: https://reviews.llvm.org/D57453 llvm-svn: 352870
* SimplifyDemandedVectorElts for all intrinsicsPhilip Reames2019-01-301-8/+8
| | | | | | | | | | The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones. This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output. This is due to order of transforms w/in instcombine resulting in two slightly different fixed points. That's something we should fix, but isn't a problem w/this patch per se. Differential Revision: https://reviews.llvm.org/D57398 llvm-svn: 352653
* [InstCombine][x86] add tests for addcarry intrinsic; NFCSanjay Patel2019-01-301-0/+36
| | | | llvm-svn: 352627
* Correct contents for r352453Philip Reames2019-01-291-72/+48
| | | | | | I had a local change I hadn't realized when submitting that auto-update. As such, the auto-update was wrong. This should fix it, and with that, it's clearly time to stop submitting changes and go to bed. llvm-svn: 352454
* [Tests] Regen to remove future test diffsPhilip Reames2019-01-291-128/+152
| | | | | | This file appears to have been manually editted at some point after being auto-updated. A future change adjusts this file slightly, and all of the updates makes the diff super confusing. llvm-svn: 352453
* [InstCombine] Make x86 PADDS/PSUBS constant folding tests genericSimon Pilgrim2018-12-201-399/+0
| | | | | | | | As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder. PR40110 has been raised to fix the regression with constant folding vectors containing undef elements. llvm-svn: 349759
* Regenerate testSimon Pilgrim2018-12-191-48/+96
| | | | llvm-svn: 349646
* [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)Sanjay Patel2018-12-111-22/+63
| | | | | | | | | | | | | call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM This has the potential to create less-than-8-bit scalar types as shown in some of the test diffs, but it looks like the backend knows how to deal with that in these patterns. This is the simple part of the fix suggested in: https://bugs.llvm.org/show_bug.cgi?id=39927 Differential Revision: https://reviews.llvm.org/D55529 llvm-svn: 348862
* [InstCombine] add tests for movmsk (PR39927) NFCSanjay Patel2018-12-101-6/+80
| | | | llvm-svn: 348800
* [InstCombine] drop poison flags in SimplifyVectorDemandedEltsSanjay Patel2018-10-041-1/+1
| | | | | | | | | | | | | | | We established the (unfortunately complicated) rules for UB/poison propagation with vector ops in: D48893 D48987 D49047 It's clear from the affected tests that we are potentially creating poison where none existed before the transforms. For add/sub/mul, the answer is simple: just drop the flags because the extra undef vector lanes are generally more valuable for analysis and codegen. llvm-svn: 343819
* [InstCombine] allow SimplifyDemandedVectorElts to work with FP binopsSanjay Patel2018-10-031-1/+1
| | | | | | | | | | | | | | | | | We're a long way from D50992 and D51553, but this is where we have to start. We weren't back-propagating undefs into binop constant values for anything but add/sub/mul/and/or/xor. This is likely because we have to be careful about not introducing UB/poison with div/rem/shift. But I suspect we already are getting the poison part wrong for add/sub/mul (although it may not be possible to expose the bug currently because we use SimplifyDemandedVectorElts from a limited set of opcodes). See the discussion/implementation from D48987 and D49047. This patch just enables functionality for FP ops because those do not have UB/poison potential. llvm-svn: 343727
* [InstCombine][x86] try even harder to convert blendv intrinsic to generic IR ↵Sanjay Patel2018-09-221-11/+10
| | | | | | | | | | | | | | | | (PR38814) Follow-up to rL342324 (D52059): Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 This is an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. llvm-svn: 342806
* [InstCombine][x86] try harder to convert blendv intrinsic to generic IR ↵Sanjay Patel2018-09-151-16/+8
| | | | | | | | | | | | | | | | | | (PR38814) Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 If this works, it's an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. I don't think that's too likely, but I've kept this patch minimal with a 'TODO', so we can test that theory in the wild before expanding the transform. Differential Revision: https://reviews.llvm.org/D52059 llvm-svn: 342324
* [InstCombine] add more tests for x86 blendv (PR38814); NFCSanjay Patel2018-09-141-0/+90
| | | | llvm-svn: 342237
* [InstCombine][x86] add tests for possible blendv transform (PR38814); NFCSanjay Patel2018-09-071-34/+98
| | | | llvm-svn: 341715
* [X86] Constant folding of adds/subs intrinsicsTomasz Krupa2018-08-141-0/+351
| | | | | | | | | | | | | | Summary: This adds constant folding of signed add/sub with saturation intrinsics. Reviewers: craig.topper, spatel, RKSimon, chandlerc, efriedma Reviewed By: craig.topper Subscribers: rnk, llvm-commits Differential Revision: https://reviews.llvm.org/D50499 llvm-svn: 339659
* [X86] Remove and autoupgrade the scalar fma intrinsics with masking.Craig Topper2018-07-121-148/+512
| | | | | | This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. llvm-svn: 336871
* [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart ↵Craig Topper2018-07-051-30/+64
| | | | | | independent FMA and extractelement/insertelement. llvm-svn: 336315
* [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics ↵Craig Topper2018-06-271-36/+36
| | | | | | | | that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744
* [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceilMikhail Dvoretckii2018-06-193-0/+292
| | | | | | | | | | | | This patch replaces calls to X86-specific intrinsics with floor-ceil semantics with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations. Differential Revision: https://reviews.llvm.org/D48067 llvm-svn: 335039
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-152-8/+4
| | | | | | | | | | | | | | Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849
* NFC: Regenerating x86-sse41.ll test for InstCombineMikhail Dvoretckii2018-06-151-4/+4
| | | | | | Test regenerated to reduce noise in further patches. llvm-svn: 334806
* [X86] Remove masking from the 512-bit masked floating point add/sub/mul/div ↵Craig Topper2018-06-101-80/+128
| | | | | | intrinsics. Use a select in IR instead. llvm-svn: 334358
* [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select ↵Craig Topper2018-05-201-238/+358
| | | | | | | | in IR instead. Someday maybe we'll use selects for all intrinsics. llvm-svn: 332824
* [X86] Extend instcombine folds for pclmuldq intrinsics to the 256 and 512 ↵Craig Topper2018-05-131-0/+186
| | | | | | bit version. llvm-svn: 332202
* [X86] Add missing test for the InstCombines of pclmulqdq.Craig Topper2018-05-131-0/+80
| | | | | | Apparently this test was lost when r293151 was committed. It was present in the review, but not the commit. llvm-svn: 332199
* [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer ↵Craig Topper2018-05-111-233/+0
| | | | | | used by clang. llvm-svn: 332146
* [PatternMatch] allow undef elements when matching a vector zeroSanjay Patel2018-04-221-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | This is the last step in getting constant pattern matchers to allow undef elements in constant vectors. I'm adding a dedicated m_ZeroInt() function and building m_Zero() from that. In most cases, calling code can be updated to use m_ZeroInt() directly when there's no need to match pointers, but I'm leaving that efficiency optimization as a follow-up step because it's not always clear when that's ok. There are just enough icmp folds in InstSimplify that can be used for integer or pointer types, that we probably still want a generic m_Zero() for those cases. Otherwise, we could eliminate it (and possibly add a m_NullPtr() as an alias for isa<ConstantPointerNull>()). We're conservatively returning a full zero vector (zeroinitializer) in InstSimplify/InstCombine on some of these folds (see diffs in InstSimplify), but I'm not sure if that's actually necessary in all cases. We may be able to propagate an undef lane instead. One test where this happens is marked with 'TODO'. llvm-svn: 330550
* [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.Craig Topper2018-04-131-21/+57
| | | | | | | | This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990
* [PatternMatch] allow undef elements when matching vector FP +0.0Sanjay Patel2018-03-251-2/+1
| | | | | | | | | | | | | This continues the FP constant pattern matching improvements from: https://reviews.llvm.org/rL327627 https://reviews.llvm.org/rL327339 https://reviews.llvm.org/rL327307 Several integer constant matchers also have this ability. I'm separating matching of integer/pointer null from FP positive zero and renaming/commenting to make the functionality clearer. llvm-svn: 328461
* [InstSimplify, InstCombine] add/update tests with FP +0.0 vector with undef; NFCSanjay Patel2018-03-251-139/+112
| | | | llvm-svn: 328455
* [ConstantFold] fp_binop undef, undef --> undefSanjay Patel2018-03-081-1/+1
| | | | | | | | | | | | | | | | | These are uncontroversial and independent of a proposed LangRef edits (D44216). I tried to fix tests that would fold away: rL327004 rL327028 rL327030 rL327034 I'm not sure if the Reassociate tests are meaningless yet, but they probably will be as we add more folds, so if anyone has suggestions or wants to fix those, please do. Differential Revision: https://reviews.llvm.org/D44258 llvm-svn: 327058
* [X86] Change signatures of avx512 packed fp compare intrinsics to return a ↵Craig Topper2018-02-101-62/+106
| | | | | | | | | | | | | | | | | | | | | | | vXi1 mask type to be closer to an fcmp. Summary: This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR. This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares. Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it. This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once. Reviewers: spatel, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43137 llvm-svn: 324827
* [InstCombine] Check for isa<Instruction> before using cast<>Simon Pilgrim2017-12-281-0/+13
| | | | | | | | Protects against casts from constexpr etc. Reduced from oss-fuzz #4788 test case llvm-svn: 321515
OpenPOWER on IntegriCloud