summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* Recommit r367901 "[X86] Enable ↵Craig Topper2019-08-072-12/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183
* [X86] EltsFromConsecutiveLoads - early out for non-byte sized memory (PR42909)Simon Pilgrim2019-08-071-0/+3
| | | | | | Don't attempt to merge loads for types that aren't modulo 8-bits. llvm-svn: 368165
* [X86] Allow any 8-bit immediate to be used with bt/btc/btr/bts memory aliases.Craig Topper2019-08-071-4/+4
| | | | | | | | We have aliases that disambiguate memory forms of bt/btc/btr/bts without suffixes to the 32-bit form. These aliases should have been updated when the instructions were updated in r356413. llvm-svn: 368127
* [X86] Use isInt<8> to simplify some code. NFCCraig Topper2019-08-071-1/+1
| | | | llvm-svn: 368126
* [X86] Limit vpermil2pd/vpermil2ps immediates to 4 bits in the assembly parser.Craig Topper2019-08-074-4/+31
| | | | | | | | | | The upper 4 bits of the immediate byte are used to encode a register. We need to limit the explicit immediate to fit in the remaining 4 bits. Fixes PR42899. llvm-svn: 368123
* Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."Mitch Phillips2019-08-062-47/+12
| | | | | | | | | This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107
* [X86] Don't allow combineSIntToFP to create v2i32 vectors after type ↵Craig Topper2019-08-061-4/+14
| | | | | | | | | | | | | | legalization. If we're after type legalization we should only be trying to turn v2i64 into v2i32. So bitcast to v4i32, shuffle the even elements together. Then use X86ISD::CVTSI2P. The alternative is to leave the v2i64 type alone and let it scalarized. Hopefully keeping it packed is better. Fixes PR42905. llvm-svn: 368091
* [X86] Move CPU features for Barcelona/K10 out of lineRoman Lebedev2019-08-061-4/+17
| | | | | | | | | | | | | | | | | | | | Summary: Cleans X86.td's Barcelona entry to be more like the others, by moving the features out of the `Proc<>`, thus potentially making it possible to inherit from them. Split off from D63628 Reviewers: craig.topper, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65791 llvm-svn: 368061
* [X86][SSE] Call SimplifyMultipleUseDemandedBits on PACKSS/PACKUS arguments.Simon Pilgrim2019-08-061-4/+24
| | | | | | This mainly helps to replace unused arguments with UNDEF in the case where they have multiple users. llvm-svn: 368026
* [X86] SimplifyMultipleUseDemandedBits - target shuffles might not be identitySimon Pilgrim2019-08-061-2/+3
| | | | | | | | If we don't demand any non-undef shuffle elements then the assert will fail as all shuffle inputs would still be flagged as 'identity' safe. Exposed by an incoming patch. llvm-svn: 368022
* [X86][SSE] Enable min/max partial reductionSimon Pilgrim2019-08-061-1/+1
| | | | | | As mentioned on D65047 / rL366933 the plan is to enable partial reduction handling wherever possible. llvm-svn: 368016
* [SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTERCullen Rhodes2019-08-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before this patch MGATHER/MSCATTER is capable of representing all common addressing modes, but only when illegal types are used. This patch adds an IndexType property so more representations are available when using legal types only. Original modes: vector of bases base + vector of signed scaled offsets New modes: base + vector of signed unscaled offsets base + vector of unsigned scaled offsets base + vector of unsigned unscaled offsets The current behaviour of addressing modes for gather/scatter remains unchanged. Patch by Paul Walker. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D65636 llvm-svn: 368008
* [GlobalISel][CallLowering] Rename isArgumentHandler() -> ↵Amara Emerson2019-08-051-1/+1
| | | | | | | | | isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945
* [X86] Enable -x86-experimental-vector-widening-legalization by default.Craig Topper2019-08-052-12/+47
| | | | | | | | | | | | | | | | | | | | | This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901
* [DAGCombiner][x86] prevent infinite loop from truncate/extend transformsSanjay Patel2019-08-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | | The test case is based on the example from the post-commit thread for: https://reviews.llvm.org/rGc9171bd0a955 This replaces the x86-specific simple-type check from: rL367766 with a check in the DAGCombiner. Adding the check isn't strictly necessary after the fix from: rL367768 ...but it seems likely that we're heading for trouble if we are creating weird types in this transform. I combined the earlier legality check into the initial clause to simplify the code. So we should only try the trunc/sext transform at the earliest combine stage, but we limit the transform to simple types anyway because the TLI hook is probably too lax about what it considers a free truncate. llvm-svn: 367834
* [LLVM][Alignment] Introduce Alignment TypeGuillaume Chatelet2019-08-051-3/+3
| | | | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828
* [X86] Fix a bad early out in combineExtInVec that prevented recursive ↵Craig Topper2019-08-051-5/+3
| | | | | | shuffle combining from running with -x86-experimental-vector-widening-legalization. llvm-svn: 367798
* [TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base ↵Craig Topper2019-08-041-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users. Summary: The SimplifyDemandedVectorElts function can replace with undef when no elements are demanded, but due to how it interacts with TargetLoweringOpts, it can only do this when the node has no other users. Remove a now unneeded DAG combine from the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65713 llvm-svn: 367788
* [X86] lowerShuffleAsSpecificZeroOrAnyExtend - use undef PSHUFB mask indices ↵Simon Pilgrim2019-08-041-2/+6
| | | | | | for ANY_EXTEND shuffles llvm-svn: 367784
* Fix signed/unsigned comparison warning. NFC.Simon Pilgrim2019-08-041-1/+1
| | | | llvm-svn: 367783
* [X86] SimplifyMultipleUseDemandedBits - Add target shuffle supportSimon Pilgrim2019-08-041-0/+46
| | | | llvm-svn: 367782
* [X86] Consistently use MVT::i8 for the constant operand of BLENDI and ↵Craig Topper2019-08-041-8/+8
| | | | | | | | | INSERTPS nodes. This is the type listed in the type constraint for isel. But since we list a type there, it doesn't get checked during isel matching. llvm-svn: 367775
* [x86] change free truncate hook to handle only simple types (PR42880)Sanjay Patel2019-08-031-0/+2
| | | | | | | | | | | | | This avoids the crash from: https://bugs.llvm.org/show_bug.cgi?id=42880 ...and I think it's a proper constraint for the TLI hook. But that example raises questions about what happens to get us into this situation (created i29 types) and what happens later (why does legalization die on those types), so I'm not sure if we will resolve the bug based on this change. llvm-svn: 367766
* Emit diagnostic if an inline asm constraint requires an immediateBill Wendling2019-08-031-2/+3
| | | | | | | | | | | | | | | | | | Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750
* [X86] Use the pointer VT for the Scale node when lowering x86 gather/scatter ↵Craig Topper2019-08-021-4/+12
| | | | | | | | | | | intrinsics. This is consistent with the target independent intrinsic handling. Not sure this really matters since we just pull the constant out using getZExtValue later. llvm-svn: 367736
* GlobalISel: support swiftself attributeTim Northover2019-08-021-0/+1
| | | | llvm-svn: 367683
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-019-34/+33
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* [X86] In decomposeMulByConstant, legalize the VT before querying whether the ↵Craig Topper2019-08-012-3/+14
| | | | | | | | | | | | multiply is legal If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it. This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that... Differential Revision: https://reviews.llvm.org/D65533 llvm-svn: 367601
* [X86][SSE] Add PEXTR*(PINSR*(v, s, c), c) -> s combine.Simon Pilgrim2019-08-011-4/+15
| | | | | | We should probably extend this to cover bitcasts as well to help other cases in promote-vec3.ll. llvm-svn: 367582
* [X86][SSE] SimplifyMultipleUseDemandedBits - Add PEXTR/PINSR B+W handlingSimon Pilgrim2019-08-012-0/+31
| | | | | | This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367570
* [X86] EltsFromConsecutiveLoads - don't attempt to merge volatile loads (PR42846)Simon Pilgrim2019-08-011-1/+4
| | | | llvm-svn: 367556
* Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" andAmy Huang2019-07-311-11/+0
| | | | | | | | | | and partial fix. Causes windows buildbot errors. This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and 53da7ca94343166ac68aef81db0398932fc258bb. llvm-svn: 367496
* [X86] Add DAG combine to fold any_extend_vector_inreg+truncstore to an ↵Craig Topper2019-07-311-0/+35
| | | | | | | | | | | | extractelement+store We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it. This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk. Differential Revision: https://reviews.llvm.org/D65538 llvm-svn: 367488
* [GISel] Address review feedback on passing MD_callees to lowerCall.Mark Lacey2019-07-311-1/+1
| | | | | | | Preserve the nullptr default for KnownCallees that appears in the base class. llvm-svn: 367477
* [GISel] Pass MD_callees metadata down in call lowering.Mark Lacey2019-07-312-2/+4
| | | | | | | | | | | | | | | | | | | | Summary: This will make it possible to improve IPRA by taking into account register usage in indirect calls. NFC yet; this is just laying the groundwork to start building up patches to take advantage of the information for improved register allocation. Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65488 llvm-svn: 367476
* Reland "[DwarfDebug] Dump call site debug info"Djordje Todorovic2019-07-312-1/+90
| | | | | | | | | The build failure found after the rL365467 has been resolved. Differential Revision: https://reviews.llvm.org/D60716 llvm-svn: 367446
* [X86] Moved IsNOT helper earlier. NFCI.Simon Pilgrim2019-07-311-28/+28
| | | | | | Makes it available for more combines to use without adding declarations. llvm-svn: 367436
* [X86][AVX] Ensure chained subvector insertions are the same size (PR42833)Simon Pilgrim2019-07-311-0/+2
| | | | | | Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all the same type. On AVX512 targets especially we might have a mixture of 128/256 subvector insertions. llvm-svn: 367429
* [MS] Emit S_HEAPALLOCSITE debug info in SelectionDAGAmy Huang2019-07-311-0/+11
| | | | | | | | | | | | | | Summary: This emits labels around heapallocsite calls in SelectionDAG. Reviewers: rnk Subscribers: MatzeB, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61105 llvm-svn: 367374
* [X86] Fix mistake in comment. NFCCraig Topper2019-07-301-2/+2
| | | | | | The code is matching sext not zext. llvm-svn: 367357
* [X86] SimplifyDemandedVectorEltsForTargetNode should be calling ↵Simon Pilgrim2019-07-301-0/+1
| | | | | | | | resolveTargetShuffleInputs not getTargetShuffleMask Add TODO comment. llvm-svn: 367318
* [X86][AVX] SimplifyDemandedVectorElts - handle extraction from ↵Simon Pilgrim2019-07-301-8/+10
| | | | | | | | | | X86ISD::SUBV_BROADCAST source (PR42819) PR42819 showed an issue that we couldn't handle the case where we demanded a 'sub-sub-vector' of the SUBV_BROADCAST 'sub-vector' source. This patch recognizes these cases and extracts the sub-sub-vector instead of trying to broadcast to a type smaller than the 'sub-vector' source. llvm-svn: 367306
* [X86] Fix typo in comment. We're looking at a right shift not a left shift. NFCCraig Topper2019-07-291-1/+1
| | | | llvm-svn: 367251
* [X86] resolveTargetShuffleInputs - add depth to limit recursion.Simon Pilgrim2019-07-291-15/+19
| | | | | | Avoids slow downs from calls to ComputeNumSignBits/computeKnownBits going too deep. llvm-svn: 367240
* [X86] combineX86ShufflesRecursively - start recursion at depth = 0. NFCI.Simon Pilgrim2019-07-291-18/+18
| | | | | | | | As discussed on rL367171, we have a problem where the depth recursion used in combineX86ShufflesRecursively was subtly different to computeKnownBits etc. - it starts at Depth=1 instead of Depth=0 like the others and has a different maximum recursion depth. This NFC patch fixes the recursion depth to start at 0, so we can more easily reuse depth values in calls from combineX86ShufflesRecursively and its helper functions in computeKnownBits etc. llvm-svn: 367232
* [X86] Don't use PMADDWD for vector add reductions of multiplies if the mul ↵Craig Topper2019-07-291-12/+22
| | | | | | | | | | | | | | | | | | | | | inputs have an additional user. The pmaddwd inserts a truncate, if that truncate would end up creating additional instructions instead of making a zext narrower, then we shouldn't do it. I've restricted this to only sse4.1 targets since on prior targets the zext will be done in stages. So the truncate will probably not create additional instructions. Might need some more investigation of mul shrinking and the other pmaddwd transform to be sure this is the right decision. There might be a slight regression on AVX1 targets due to add splitting. Hard to say for sure. Maybe we need to look into using the vector reduction flag to use 2 narrow loads and a blend instead of extracting and inserting. llvm-svn: 367198
* [X86] In combineLoopMAddPattern and combineLoopSADPattern, preserve the ↵Craig Topper2019-07-281-78/+63
| | | | | | | | | | vector reduction flag on the final add. Handle unrolled loops by letting DAG combine revisit. This reverts r340478 and r340631 and replaces them with a simpler method of just letting DAG combine revisit the nodes to handle the other operand. llvm-svn: 367195
* [X86][SSE] Replace PMULDQ GetDemandedBits combine with ↵Simon Pilgrim2019-07-271-9/+12
| | | | | | | | SimplifyMultipleUseDemandedBits handler (Reapplied) Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines. llvm-svn: 367172
* Revert "[X86][SSE] Replace PMULDQ GetDemandedBits combine with ↵Vlad Tsyrklevich2019-07-261-12/+9
| | | | | | | | | SimplifyMultipleUseDemandedBits handler." This reverts r367100, it appears to be causing test failures after Nico's revert of r367091. llvm-svn: 367141
* [X86][SSE] Replace PMULDQ GetDemandedBits combine with ↵Simon Pilgrim2019-07-261-9/+12
| | | | | | | | SimplifyMultipleUseDemandedBits handler. This removes a GetDemandedBits user and allows us to benefit from the DemandedElts propagated through SimplifyDemandedBits. llvm-svn: 367100
OpenPOWER on IntegriCloud