summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [CallSite removal] move InlineCost to CallBase usageFedor Sergeev2019-04-231-2/+3
| | | | | | | | | | | Converting InlineCost interface and its internals into CallBase usage. Inliners themselves are still not converted. Reviewed By: reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D60636 llvm-svn: 358982
* [ARM] Update check for CBZ in IfcvtDavid Green2019-04-233-43/+59
| | | | | | | | | | | The check for creating CBZ in constant island pass recently obtained the ability to search backwards to find a Cmp instruction. The code in IfCvt should mirror this to allow more conversions to the smaller form. The common code has been pulled out into a separate function to be shared between the two places. Differential Revision: https://reviews.llvm.org/D60090 llvm-svn: 358977
* [ARM] Don't replicate instructions in Ifcvt at minsizeDavid Green2019-04-231-0/+9
| | | | | | | | | | Ifcvt can replicate instructions as it converts them to be predicated. This stops that from happening on thumb2 targets at minsize where an extra IT instruction is likely needed. Differential Revision: https://reviews.llvm.org/D60089 llvm-svn: 358974
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2019-04-231-2/+2
| | | | llvm-svn: 358969
* [AArch64] Add support for MTE intrinsicsJaved Absar2019-04-234-22/+78
| | | | | | | | | | | This patch provides intrinsics support for Memory Tagging Extension (MTE), which was introduced with the Armv8.5-a architecture. The intrinsics are described in detail in the latest ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest Reviewed by: David Spickett Differential Revision: https://reviews.llvm.org/D60486 llvm-svn: 358963
* [ARM][FIX] Add missing f16.lane.vldN/vstN loweringDiogo N. Sampaio2019-04-231-0/+2
| | | | | | | | | | | | | | | | | | Summary: Add missing D and Q lane VLDSTLane lowering for fp16 elements. Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60874 llvm-svn: 358962
* [WebAssembly] Bail out of fastisel earlier when computing PIC addressesSam Clegg2019-04-231-11/+6
| | | | | | | | | | | This change partially reverts https://reviews.llvm.org/D54647 in favor of bailing out during computeAddress instead. This catches the condition earlier and handles more cases. Differential Revision: https://reviews.llvm.org/D60986 llvm-svn: 358948
* [SelectionDAG] move splat util functions up from x86 loweringSanjay Patel2019-04-221-56/+1
| | | | | | | | | | This was supposed to be NFC, but the change in SDLoc definitions causes instruction scheduling changes. There's nothing x86-specific in this code, and it can likely be used from DAGCombiner's simplifyVBinOp(). llvm-svn: 358930
* [AMDGPU] Fix an issue in `op_sel_hi` skipping.Michael Liao2019-04-221-7/+16
| | | | | | | | | | | | | | | | | Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922
* AMDGPU: Skip debug instructions in assertMatt Arsenault2019-04-221-2/+7
| | | | | | | | | | These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909
* AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sourcesMatt Arsenault2019-04-221-1/+3
| | | | llvm-svn: 358894
* AMDGPU: Fix not checking for copy when looking at copy srcMatt Arsenault2019-04-221-1/+6
| | | | | | | Effectively reverts r356956. The check for isFullCopy was excessive, but there still needs to be a check that this is a copy. llvm-svn: 358890
* [AMDGPU][MC] Corrected parsing of SP3 'neg' modifierDmitry Preobrazhensky2019-04-221-24/+58
| | | | | | | | | | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60624 llvm-svn: 358888
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-221-11/+25
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [X86] Reject 512-bit types in getRegForInlineAsmConstraint when AVX512 is ↵Craig Topper2019-04-221-2/+5
| | | | | | not enabled. Same for 256 bit and AVX. llvm-svn: 358872
* [ARM] Rewrite isLegalT2AddressImmediateDavid Green2019-04-211-29/+24
| | | | | | | | | | | | | | | | | | | | | This does two main things, firstly adding some at least basic addressing modes for i64 types, and secondly treats floats and doubles sensibly when there is no fpu. The floating point change can help codesize in some cases, especially with D60294. Most backends seems to not consider the exact VT in isLegalAddressingMode, instead switching on type size. That is now what this does when the target does not have an fpu (as the float data will be loaded using LDR's). i64's currently use the address range of an LDRD (even though they may be legalised and loaded with an LDR). This is at least better than marking them all as illegal addressing modes. I have not attempted to do much with vectors yet. That will need changing once MVE is added. Differential Revision: https://reviews.llvm.org/D60677 llvm-svn: 358845
* [X86] Add the rounding control operand to the printing for some scalar FMA ↵Craig Topper2019-04-211-1/+1
| | | | | | instructions. llvm-svn: 358844
* [X86] Don't form masked vfpclass instruction from and+vfpclass unless the ↵Craig Topper2019-04-211-28/+36
| | | | | | fpclass only has a single use. llvm-svn: 358841
* Revert r358800. Breaks Obsequi from the test suite.Amara Emerson2019-04-201-5/+8
| | | | | | | The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with a different issue. llvm-svn: 358829
* [X86] Disable argument copy elision for arguments passed via pointersCraig Topper2019-04-201-1/+5
| | | | | | | | | | | | | | | | | | | | | Summary: If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory. The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too. This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix. Reviewers: rnk Reviewed By: rnk Subscribers: dbabokin, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60801 llvm-svn: 358817
* [X86] Fix stack probing on x32 (PR41477)Nikita Popov2019-04-202-8/+15
| | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI with stack probing a dynamic alloca will result in a WIN_ALLOCA_32 with a 32-bit size. The current implementation tries to copy it into RAX, resulting in a physreg copy error. Fix this by copying to EAX instead. Also fix incorrect opcodes or registers used in subs. llvm-svn: 358807
* [X86] Don't turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if ↵Craig Topper2019-04-201-18/+37
| | | | | | | | | | the original AND can represented by MOVZX. The MOVZX doesn't require an immediate to be encoded at all. Though it does use a 2 byte opcode so its the same size as a 1 byte immediate. But it has a separate source and dest register so can help avoid copies. llvm-svn: 358805
* [X86] Turn (and (anyextend (shl X, C1), C2)) into (shl (and (anyextend X), ↵Craig Topper2019-04-201-9/+29
| | | | | | | | | (C1 >> C2), C2) if the AND could match a movzx. There's one slight regression in here because we don't check that the immediate already allowed movzx before the shift. I'll fix that next. llvm-svn: 358804
* Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads ↵Amara Emerson2019-04-191-8/+5
| | | | | | | | | and stores"" We were shifting the wrong component of a split load when trying to combine them back into a single value. llvm-svn: 358800
* [GlobalISel][AArch64] Legalize + select G_FRINTJessica Paquette2019-04-192-1/+2
| | | | | | | | | | Exactly the same as G_FCEIL, G_FABS, etc. Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc. Differential Revision: https://reviews.llvm.org/D60895 llvm-svn: 358799
* [WebAssembly] FastISel: Don't fallback to SelectionDAG after BuildMI in ↵Sam Clegg2019-04-191-6/+9
| | | | | | | | | | | | | | | | | selectCall My understanding is that once BuildMI has been called we can't fallback to SelectionDAG. This change moves the fallback for when getRegForValue() fails for that target of an indirect call. This was failing in -fPIC mode when the callee is GlobalValue. Add a test case that tickles this. Differential Revision: https://reviews.llvm.org/D60908 llvm-svn: 358793
* [AArch64] Fix checks for AArch64MCExpr::VK_SABS flag.Eli Friedman2019-04-191-2/+2
| | | | | | | | | | | | | | VK_SABS is part of the SymLoc bitfield in the variant kind which should be compared for equality, not by checking the VK_SABS bit. As far as I know, the existing code happened to produce the correct results in all cases, so this is just a cleanup. Patch by Stephen Crane. Differential Revision: https://reviews.llvm.org/D60596 llvm-svn: 358788
* Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"Amara Emerson2019-04-191-5/+8
| | | | | | This introduces some runtime failures which I'll need to investigate further. llvm-svn: 358771
* [GlobalISel][AArch64] Legalize vector G_FPOWJessica Paquette2019-04-191-2/+2
| | | | | | | | | | This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc. Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change. Differential Revision: https://reviews.llvm.org/D60218 llvm-svn: 358764
* [CodeGen] Add "const" to MachineInstr::mayAliasBjorn Pettersson2019-04-1914-54/+72
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744
* [AMDGPU] Ignore non-SUnits edgesPiotr Sobczak2019-04-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Ignore edges to non-SUnits (e.g. ExitSU) when checking for low latency instructions. When calling the function isLowLatencyInstruction(), an ExitSU could be on the list of successors, not necessarily a regular SU. In other places in the code there is a check "Succ->NodeNum >= DAGSize" to prevent further processing of ExitSU as "Succ->getInstr()" is NULL in such a case. Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())" has the guard, so it is clearly an omission here. Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60864 llvm-svn: 358740
* [X86] Turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the ↵Craig Topper2019-04-191-0/+3
| | | | | | | | AND could match a movzx. Could get further improvements by recognizing (i64 and (anyext (i32 shl))). llvm-svn: 358737
* [X86] Make sure we copy the HandleSDNode back to N before executing the ↵Craig Topper2019-04-191-7/+7
| | | | | | | | | | | | | | | | | | | | | | | default code after the switch in matchAddressRecursively Summary: There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code. This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead. matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere. Reviewers: niravd, spatel, RKSimon, bkramer Reviewed By: niravd Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60843 llvm-svn: 358735
* [GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8sJessica Paquette2019-04-181-1/+2
| | | | | | | | | | | | | | | This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s. We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic. This adds legalizer support for those 3, which gives us selection via the importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and add a GISel line to arm64-vabs.ll. Differential Revision: https://reviews.llvm.org/D60881 llvm-svn: 358715
* [GlobalISel][AArch64] Legalize v8s8 loadsJessica Paquette2019-04-181-0/+1
| | | | | | | | Add legalizer support for loads of v8s8 and update legalize-load-store.mir. Differential Revision: https://reviews.llvm.org/D60877 llvm-svn: 358714
* [X86] combineVectorTruncationWithPACKUS - remove split/concatenation of maskSimon Pilgrim2019-04-181-23/+6
| | | | | | | | | | | | | | combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together. This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356). This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit. My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great. Differential Revision: https://reviews.llvm.org/D60375#inline-539623 llvm-svn: 358692
* [X86][SSE] Lower ICMP EQ(AND(X,C),C) -> SRA(SHL(X,LOG2(C)),BW-1) iff C is ↵Simon Pilgrim2019-04-181-37/+21
| | | | | | | | | | | | | | power-of-2. This replaces the MOVMSK combine introduced at D52121/rL342326 (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra. Differential Revision: https://reviews.llvm.org/D60625 llvm-svn: 358651
* [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()Kang Zhang2019-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | Summary: This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177 When the two operands for BUILD_VECTOR are same, we will get assert error. llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&): Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) && "The loads cannot be both consecutive and reverse consecutive."' failed. This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We should use `getScalarType().getStoreSize();` to get the ElemSize instread of `getScalarSizeInBits() / 8`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60811 llvm-svn: 358644
* [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))Tim Renouf2019-04-181-0/+10
| | | | | | | | | | | fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640
* [x86] try to widen 'shl' as part of LEA formationSanjay Patel2019-04-171-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | The test file has pairs of tests that are logically equivalent: https://rise4fun.com/Alive/2zQ %t4 = and i8 %t1, 8 %t5 = zext i8 %t4 to i16 %sh = shl i16 %t5, 2 %t6 = add i16 %sh, %t0 => %t4 = and i8 %t1, 8 %sh2 = shl i8 %t4, 2 %z5 = zext i8 %sh2 to i16 %t6 = add i16 %z5, %t0 ...so if we can fold the shift op into LEA in the 1st pattern, then we should be able to do the same in the 2nd pattern (unnecessary 'movzbl' is a separate bug I think). We don't want to do this any sooner though because that would conflict with generic transforms that try to narrow the width of the shift. Differential Revision: https://reviews.llvm.org/D60789 llvm-svn: 358622
* [AsmPrinter] hoist %a output template to base class for ARM+Aarch64Nick Desaulniers2019-04-172-14/+0
| | | | | | | | | | | | | | | | | | | | | Summary: X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do basically the same thing (Aarch64 did not correctly handle immediates, ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses %a with an immediate) for a flag that should be target independent anyways. Reviewers: echristo, peter.smith Reviewed By: echristo Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60841 llvm-svn: 358618
* [GlobalISel] Add legalization support for non-power-2 loads and storesAmara Emerson2019-04-171-8/+5
| | | | | | | | | | Legalize things like i24 load/store by splitting them into smaller power of 2 operations. This matches how SelectionDAG handles these operations. Differential Revision: https://reviews.llvm.org/D59971 llvm-svn: 358613
* [AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFCNick Desaulniers2019-04-173-12/+4
| | | | | | | | | | | | | | | | | | | Summary: None of these derived classes do anything that the base class cannot. If we remove these case statements, then the base class can handle them just fine. Reviewers: peter.smith, echristo Reviewed By: echristo Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60803 llvm-svn: 358603
* [AMDGPU][MC] Corrected handling of "-" before expressionsDmitry Preobrazhensky2019-04-171-38/+58
| | | | | | | | | | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60622 llvm-svn: 358596
* AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructionsRhys Perry2019-04-171-0/+4
| | | | | | | | | | | | | | | | Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592
* [AMDGPU][MC] Corrected parsing of registersDmitry Preobrazhensky2019-04-171-27/+126
| | | | | | | | | | See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60621 llvm-svn: 358581
* [AMDGPU] Flag new raw/struct atomic ops as source of divergenceTim Renouf2019-04-171-0/+22
| | | | | | | Differential Revision: https://reviews.llvm.org/D60731 Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d llvm-svn: 358579
* [CostModel][X86] Add bool anyof/allof reduction costsSimon Pilgrim2019-04-171-0/+42
| | | | | | | | On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574
* [X86] In CopyToFromAsymmetricReg, use VR128 instead of FR32 instructions for ↵Craig Topper2019-04-171-12/+12
| | | | | | | | | | | | | GR32<->XMM register copies. We have two versions of some instructions, VR128 versions and FR32 versions that are marked as CodeGenOnly. This change switches to using the VR128 versions for these copies. It's after register allocation so the class size no longer matters. This matches how GR64 works. llvm-svn: 358555
* [NVPTXAsmPrinter] clean up dead code. NFCNick Desaulniers2019-04-162-45/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The printOperand function takes a default parameter, for which there are zero call sites that explicitly pass such a parameter. As such, there is no case to support. This means that the method printVecModifiedImmediate is purly dead code, and can be removed. The eventual goal for some of these AsmPrinter refactoring is to have printOperand be a virtual method; making it easier to print operands from the base class for more generic Asm printing. It will help if all printOperand methods have the same function signature (ie. no Modifier argument when not needed). Reviewers: echristo, tra Reviewed By: echristo Subscribers: jholewinski, hiraditya, llvm-commits, craig.topper, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60727 llvm-svn: 358527
OpenPOWER on IntegriCloud