summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)Simon Pilgrim2017-12-291-11/+61
| | | | | | | | | | | | | | As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles. This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles. By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks. We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc. Differential Revision: https://reviews.llvm.org/D38318 llvm-svn: 321553
* [AMDGPU][MC] Incorrect parsing of flat/global atomic modifiersDmitry Preobrazhensky2017-12-291-1/+39
| | | | | | | | | See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730 Differential Revision: https://reviews.llvm.org/D41598 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 321552
* [PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversionNemanja Ivanovic2017-12-294-23/+69
| | | | | | | | | | | | | | | Revision 320791 introduced a pass that transforms reg+reg instructions to reg+imm if they're fed by "load immediate". However, it didn't handle out-of-range shifts correctly as reported in PR35688. This patch fixes that and therefore the PR. Furthermore, there was undefined behaviour in the patch where the RHS of an initialization expression was 32 bits and constant `1` was shifted left 32 bits. This was fixed by ensuring the RHS is 64 bits just like the LHS. Differential Revision: https://reviews.llvm.org/D41369 llvm-svn: 321551
* Fix incorrect operand sizes for some MMX instructions: punpcklwd, punpcklbw ↵Andrew V. Tischenko2017-12-291-5/+9
| | | | | | | | and punpckldq. Differential Revision: https://reviews.llvm.org/D41595 llvm-svn: 321549
* [X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a ↵Craig Topper2017-12-281-6/+17
| | | | | | | | | | | | narrower extend. Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend. This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG. If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary. llvm-svn: 321538
* [X86] Use ISD::CONCAT_VECTORS when splitting 256-bit loads in combineLoad.Craig Topper2017-12-281-3/+1
| | | | llvm-svn: 321537
* [X86] Fix inconsistencies in different places where we split loads/stores.Craig Topper2017-12-281-10/+11
| | | | | | | | -Use MinAlign instead of std::min. -Use SelectionDAG::getMemBasePlusOffset. -Apply offset to the pointer info for the second load/store created. llvm-svn: 321536
* [X86] Emit ISD::TRUNCATE instead of X86ISD::VTRUNC from ↵Craig Topper2017-12-281-2/+2
| | | | | | | | LowerZERO_EXTEND_Mask/LowerSIGN_EXTEND_Mask. The truncate will be lowered X86ISD::VTRUNC later. llvm-svn: 321534
* [X86] Remove unnecessary patterns for sign extending vXi1 without VLX.Craig Topper2017-12-281-16/+0
| | | | | | The custom lowering already widens the result type to 512-bits if VLX isn't supported. llvm-svn: 321533
* [WinEH] Don't emit state stores or EH thunks for available_externally functionsReid Kleckner2017-12-281-0/+6
| | | | | | | | | | The exception handler thunk needs to reference the LSDA of the parent function, which won't be emitted if it's available_externally. Fixes PR35736. ThinLTO ends up producing available_externally functions that use _CxxFrameHandler3. llvm-svn: 321532
* Avoid int to string conversion in Twine or raw_ostream contexts.Benjamin Kramer2017-12-283-10/+4
| | | | | | Some output changes from uppercase hex to lowercase hex, no other functionality change intended. llvm-svn: 321526
* [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zerosSimon Pilgrim2017-12-281-0/+14
| | | | | | | | | | If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow. The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend. Differential Revision: https://reviews.llvm.org/D41484 llvm-svn: 321516
* [X86] Add CLWB to icelake.Craig Topper2017-12-271-1/+2
| | | | | | Per Table 1-1 in October 2017 edition of Intel® Architecture Instruction Set Extensions and Future Features llvm-svn: 321501
* [X86] Reimplement r321437 using custom lowering instead of as a DAG combine.Craig Topper2017-12-271-43/+6
| | | | | | | | My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues. So just do it in LowerMUL where we can catch more cases. llvm-svn: 321496
* [AArch64] Change order of candidate FMLS patternsMatthew Simpson2017-12-271-22/+22
| | | | | | | | | | | | | | | | | | | | | r319980 added new patterns to the machine combiner for transforming (fsub (fmul x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source operand is an fmul are transformed. We previously only matched the case where the second source operand of an fsub was an fmul, transforming (fsub z (fmul x y)) into (fmls z x y). Now, if we have an fsub where both source operands are fmuls, both of the above patterns are applicable. However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used. This patch changes the order these two patterns are added to the list of candidates such that we prefer the case where the second source operand is an fmul (the fmls case), rather than the other one (the fmla/fneg case). When both source operands are fmuls, this ordering results in fewer instructions. Differential Revision: https://reviews.llvm.org/D41587 llvm-svn: 321491
* [X86] Fix vmul combine for AVX1 targets.Benjamin Kramer2017-12-271-0/+4
| | | | | | v8i32 is legal von AVX1, but it doesn't have pmuludq for it. llvm-svn: 321490
* [X86] Return SDValue(N, 0) instead of an SDValue() after a successful combine.Craig Topper2017-12-261-2/+2
| | | | | | | | Returning SDValue() means nothing changed, SDValue(N,0) means there was a change but the worklist management was taken care of. I don't know if this has a real effect other than making sure the combine counter in the DAG combiner gets updated, but it is the correct thing to do. llvm-svn: 321463
* It's a fix for Bug 35741 - can't use comments after x86 prefixes.Andrew V. Tischenko2017-12-261-2/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D41579 llvm-svn: 321459
* [X86] Pass itins.rr/itins.rm through properly for some instructions.Craig Topper2017-12-261-11/+12
| | | | llvm-svn: 321452
* [X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their ↵Craig Topper2017-12-261-5/+5
| | | | | | SSE/AVX counterparts. llvm-svn: 321451
* [X86] Fix typo in assert message.Craig Topper2017-12-261-1/+1
| | | | llvm-svn: 321450
* [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the ↵Craig Topper2017-12-251-0/+34
| | | | | | | | | | upper bits are all sign bits or zeros. Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ. This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop. llvm-svn: 321437
* [X86] Make some helper methods static functions instead. NFCCraig Topper2017-12-252-19/+15
| | | | llvm-svn: 321433
* [X86] Use SelectionDAG::getFPExtendOrRound to simplify some code.Craig Topper2017-12-251-10/+1
| | | | llvm-svn: 321432
* Make helpers static. No functionality change.Benjamin Kramer2017-12-241-1/+9
| | | | llvm-svn: 321425
* [X86][X87] Mark pseudo memory fold instructions as load/sideeffects ↵Simon Pilgrim2017-12-241-4/+2
| | | | | | | | (PR21160, PR34080, PR34454). Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass. llvm-svn: 321424
* [X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ.Craig Topper2017-12-241-7/+18
| | | | | | Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32. llvm-svn: 321420
* [X86] Add assembler predicates to BITALG/VBMI2/VNNI features to be ↵Craig Topper2017-12-241-3/+6
| | | | | | consistent with the other AVX512 ISAs. llvm-svn: 321416
* [X86] Teach WidenMaskArithmetic to handle any constant buildvector on the ↵Craig Topper2017-12-241-10/+5
| | | | | | RHS not just all zeros/ones. llvm-svn: 321415
* [X86] Remove type restrictions from WidenMaskArithmetic.Craig Topper2017-12-231-7/+6
| | | | | | This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types. llvm-svn: 321408
* [X86] In WidenMaskArithmetic, make sure we check the input type of a ↵Craig Topper2017-12-231-1/+2
| | | | | | | | truncate on N1. Later in the code we explicitly bypass the truncate so we should be checking its type to make sure that it's safe. llvm-svn: 321407
* [X86] Remove unneeded EVT variable. NFCCraig Topper2017-12-231-6/+5
| | | | | | | | Immediately after it is created we check if its equal to another EVT. Then we inconsistently use one or the other variables in the code below. Instead do the equality check directly on the getValueType result and remove the variable. Use the origina VT variable throughout the remaining code. llvm-svn: 321406
* [X86][X87] Wrap FpI_ pseudo to use PseudoI. NFCI.Simon Pilgrim2017-12-231-2/+1
| | | | llvm-svn: 321405
* [X86] Add default InstrItinClass to PseudoISimon Pilgrim2017-12-231-2/+3
| | | | | | This will be used to help tidyup existing pseudos that we've added scheduling info to. llvm-svn: 321401
* [X86] Pass the right VT to the getZeroExtendInReg introduced in r321398Craig Topper2017-12-231-1/+1
| | | | | | Apparently we don't have tests for this which I didn't realize before. I'll try to fix that but wanted to fix the obvious bug. llvm-svn: 321399
* [X86] Use SelectionDAG::getZeroExtendInReg instead of implementing it manually.Craig Topper2017-12-231-9/+3
| | | | llvm-svn: 321398
* [SelectionDAG][X86] Don't use ->getValueType(0) after a call to getOperand ↵Craig Topper2017-12-231-7/+7
| | | | | | | | | | to get the type of the operand. getOperand returns an SDValue that contains the node and the result number. There is no guarantee that the result number if 0. By using the -> operator we are calling SDNode::getValueType rather than SDValue::getValueType. This requires supplying a result number and we shouldn't assume it was 0. I don't have a test case. Just noticed while cleaning up some other code and saw that it occurred in other places. llvm-svn: 321397
* (Re-landing) Expose a TargetMachine::getTargetTransformInfo functionSanjoy Das2017-12-2227-80/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-land r321234. It had to be reverted because it broke the shared library build. The shared library build broke because there was a missing LLVMBuild dependency from lib/Passes (which calls TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can tell, this problem was always there but was somehow masked before (perhaps because TargetMachine::getTargetIRAnalysis was a virtual function). Original commit message: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321375
* [AMDGPU][MC] Corrected handling of negative expressionsDmitry Preobrazhensky2017-12-221-1/+6
| | | | | | | | | | See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41488 llvm-svn: 321372
* [X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a ↵Craig Topper2017-12-221-8/+6
| | | | | | | | non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements. Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code. llvm-svn: 321369
* [X86] Improve the printing of address mode during isel matching.Craig Topper2017-12-221-4/+5
| | | | | | Fix some inconsistent new line behavior and only print the FrameIndex when the address mode is a FrameIndexBase addressing mode. llvm-svn: 321368
* [AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32Dmitry Preobrazhensky2017-12-221-1/+3
| | | | | | | | | | See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41186 llvm-svn: 321367
* [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registersDmitry Preobrazhensky2017-12-227-43/+197
| | | | | | | | | | | | See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561 This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic. Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41437 llvm-svn: 321359
* [ARM GlobalISel] Support G_INTTOPTR and G_PTRTOINT for s32Diana Picus2017-12-223-0/+30
| | | | | | | Mark conversions between pointers and 32-bit scalars as legal, map them to the GPR and select to a simple COPY. llvm-svn: 321356
* [ARM GlobalISel] Support pointer constantsDiana Picus2017-12-222-1/+11
| | | | | | | | | | Pointer constants are pretty rare, since we usually represent them as integer constants and then cast to pointer. One notable exception is the null pointer constant, which is represented directly as a G_CONSTANT 0 with pointer type. Mark it as legal and make sure it is selected like any other integer constant. llvm-svn: 321354
* [X86] Add missing initialization for the HasPREFETCHWT1 subtarget variable.Craig Topper2017-12-221-0/+1
| | | | llvm-svn: 321340
* [X86] Enable PRFCHW feature on KNL/KNM and all CPUs inherited from Broadwell.Craig Topper2017-12-221-2/+4
| | | | llvm-svn: 321336
* [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling ↵Craig Topper2017-12-226-7/+33
| | | | | | | | | | | | | | for prefetch instructions. Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well. The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled. Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions. I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations. llvm-svn: 321335
* [X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1.Craig Topper2017-12-221-15/+15
| | | | llvm-svn: 321334
* [Inliner] Restrict soft-float inlining penalty.Eli Friedman2017-12-222-21/+0
| | | | | | | | | | | | | | The penalty is currently getting applied in a bunch of places where it doesn't make sense, like bitcasts (which are free) and calls (which were getting the call penalty applied twice). Instead, just apply the penalty to binary operators and floating-point casts. While I'm here, also fix getFPOpCost() to do the right thing in more cases, so we don't have to dig into function attributes. Differential Revision: https://reviews.llvm.org/D41522 llvm-svn: 321332
OpenPOWER on IntegriCloud