summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] gfx1010 sgpr register changesStanislav Mekhanoshin2019-04-2410-41/+123
| | | | | | Differential Revision: https://reviews.llvm.org/D61045 llvm-svn: 359117
* [AMDGPU] Add gfx1010 target definitionsStanislav Mekhanoshin2019-04-2414-94/+516
| | | | | | Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113
* [AMDGPU][MC] Parser cleanup and refactoringDmitry Preobrazhensky2019-04-241-93/+48
| | | | | | | | Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60767 llvm-svn: 359096
* [x86] make sure horizontal op and broadcast types match to simplify (PR41414)Sanjay Patel2019-04-241-2/+6
| | | | | | | | | If the types don't match, we can't just remove the shuffle. There may be some other opportunity for optimization here, but this should prevent the crashing seen in: https://bugs.llvm.org/show_bug.cgi?id=41414 llvm-svn: 359095
* [X86] Add shouldFoldConstantShiftPairToMask override placeholder. NFCI.Simon Pilgrim2019-04-242-0/+9
| | | | | | Prep work toward fixing PR40758 llvm-svn: 359088
* Add "const" in GetUnderlyingObjects. NFCBjorn Pettersson2019-04-242-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Both the input Value pointer and the returned Value pointers in GetUnderlyingObjects are now declared as const. It turned out that all current (in-tree) uses of GetUnderlyingObjects were trivial to update, being satisfied with have those Value pointers declared as const. Actually, in the past several of the users had to use const_cast, just because of ValueTracking not providing a version of GetUnderlyingObjects with "const" Value pointers. With this patch we get rid of those const casts. Reviewers: hfinkel, materi, jkorous Reviewed By: jkorous Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61038 llvm-svn: 359072
* [Mips][CodeGen] Remove MachineFunction::setSubtarget. Change Mips to just ↵Craig Topper2019-04-242-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | copy the subtarget from the MachineFunction instead of recalculating it. Summary: The MachineFunction should have been created with the correct subtarget. As long as there is no way to change it, MipsTargetMachine can just capture it directly from the MachineFunction without calling getSubtargetImpl again. While there, const correct the Subtarget pointer to avoid a const_cast. I believe the Mips16Subtarget and NoMips16Subtarget members are never used, but I'll leave there removal for a separate patch. Reviewers: echristo, atanasyan Reviewed By: atanasyan Subscribers: sdardis, arichardson, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60936 llvm-svn: 359071
* [AArch64][GlobalISel] Select G_INTRINSIC_ROUNDJessica Paquette2019-04-231-0/+58
| | | | | | | Add selection support for G_INTRINSIC_ROUND, add a selection test, and add check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll. llvm-svn: 359046
* [AArch64][GlobalISel] Mark G_INTRINSIC_ROUND as a pre-isel floating point opcodeJessica Paquette2019-04-231-0/+1
| | | | | | | | | Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its input and output are assigned the correct register bank. Add a regbankselect test to verify that we get what we expect here. llvm-svn: 359044
* [WebAssembly] Emit br_table for most switch instructionsHeejin Ahn2019-04-231-0/+5
| | | | | | | | | | | | | | | | | | Summary: Always convert switches to br_tables unless there is only one case, which is equivalent to a simple branch. This reduces code size for wasm, and we defer possible jump table optimizations to the VM. Addresses PR41502. Reviewers: kripken, sunfish Subscribers: dschuff, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60966 llvm-svn: 359038
* [AArch64][GlobalISel] Legalize G_INTRINSIC_ROUNDJessica Paquette2019-04-231-2/+2
| | | | | | | Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing switch case to AArch64LegalizerInfo.cpp. llvm-svn: 359033
* [AArch64][GlobalISel] Actually select G_INTRINSIC_TRUNCJessica Paquette2019-04-231-1/+58
| | | | | | | | | | | | | | Apparently FileCheck wasn't actually matching the fallback check lines in arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for G_INTRINSIC_TRUNC there. Actually hook it up into AArch64InstructionSelector.cpp and write a proper selection test. I guess I'll figure out the FileCheck magic to make the fallback checks work properly in arm64-vfloatintrinsics.ll. llvm-svn: 359030
* [AArch64][GlobalISel] Teach regbankselect about G_INTRINSIC_TRUNCJessica Paquette2019-04-231-0/+1
| | | | | | | | Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test. Update arm64-vfloatintrinsics.ll now that we can select it. llvm-svn: 359022
* [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNCJessica Paquette2019-04-231-1/+1
| | | | | | | | | Same patch as G_FCEIL etc. Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct rule in AArch64LegalizerInfo.cpp, and add a test. llvm-svn: 359021
* [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cppStanislav Mekhanoshin2019-04-231-1/+1
| | | | | | | | The second argument is flags, not subreg. Differential Revision: https://reviews.llvm.org/D61031 llvm-svn: 359017
* [AArch64][GlobalISel] Legalize G_FMA for more vector typesJessica Paquette2019-04-231-2/+3
| | | | | | | | | Same as G_FCEIL, G_FABS, etc. Just move it into that rule. Add a legalizer test for G_FMA, which we didn't have before and update arm64-vfloatintrinsics.ll. llvm-svn: 359015
* [AArch64][GlobalISel] Add G_FMA to isPreISelGenericFloatingPointOpcodeJessica Paquette2019-04-231-0/+1
| | | | | | | | Noticed an unnecessary fallback in arm64-vmul caused by this. Also add a regbankselect test for G_FMA. llvm-svn: 359013
* [x86] use psubus for more vsetcc lowering (PR39859)Sanjay Patel2019-04-231-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Circling back to a leftover bit from PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859#c1 ...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'. This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca. We already do this transform for the SETULT predicate, so this makes the code more symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect anything but pre-SSE4.1 subtargets. $ cat before.s movdqa -16(%rip), %xmm2 ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768] pxor %xmm0, %xmm2 pcmpgtw -32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255] pand %xmm2, %xmm0 pandn %xmm1, %xmm2 por %xmm2, %xmm0 $ cat after.s movdqa -16(%rip), %xmm2 ## xmm2 = [256,256,256,256,256,256,256,256] psubusw %xmm0, %xmm2 pxor %xmm3, %xmm3 pcmpeqw %xmm2, %xmm3 pand %xmm3, %xmm0 pandn %xmm1, %xmm3 por %xmm3, %xmm0 $ llvm-mca before.s -mcpu=haswell Iterations: 100 Instructions: 600 Total Cycles: 909 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 0.77 IPC: 0.66 Block RThroughput: 1.8 $ llvm-mca after.s -mcpu=haswell Iterations: 100 Instructions: 700 Total Cycles: 409 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.71 IPC: 1.71 Block RThroughput: 1.8 Differential Revision: https://reviews.llvm.org/D60838 llvm-svn: 358999
* [SPARC] Use the correct register set for the "r" asm constraint.Joerg Sonnenberger2019-04-231-0/+2
| | | | | | | | 64bit mode must use 64bit registers, otherwise assumptions about the top half of the registers are made. Problem found by Takeshi Nakayama in NetBSD. llvm-svn: 358998
* Use llvm::stable_sortFangrui Song2019-04-231-2/+1
| | | | | | While touching the code, simplify if feasible. llvm-svn: 358996
* [RISCV] Support assembling %tls_{ie,gd}_pcrel_hi modifiersLewis Revill2019-04-238-4/+48
| | | | | | | | | This patch adds support for parsing and assembling the %tls_ie_pcrel_hi and %tls_gd_pcrel_hi modifiers. Differential Revision: https://reviews.llvm.org/D55342 llvm-svn: 358994
* [AMDGPU] Fix hidden argument metadata duplication for V3Scott Linder2019-04-231-30/+0
| | | | | | | | | | Essentially complete a proper rebase of the V3 metadata change over https://reviews.llvm.org/D49096. Minimize the diff between the V2 and V3 variants of the relevant lit tests, and clean up some trailing whitespace. llvm-svn: 358992
* [X86] Pull out collectConcatOps helper. NFCI.Simon Pilgrim2019-04-231-21/+51
| | | | | | Create collectConcatOps helper that returns all the subvector ops for CONCAT_VECTORS or a INSERT_SUBVECTOR series. llvm-svn: 358989
* ARM: disallow add/sub to sp unless Rn is also sp.Tim Northover2019-04-232-1/+27
| | | | | | | | The manual says that Thumb2 add/sub instructions are only allowed to modify sp if the first source is also sp. This is slightly different from the usual rGPR restriction since it's context-sensitive, so implement it in C++. llvm-svn: 358987
* AMDGPU: Fix LCSSA phi lowering in SILowerI1CopiesNicolai Haehnle2019-04-231-1/+8
| | | | | | | | | | | | | | | | | | | | | | Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic that does the lanemask merging. This then used to trigger an assertion when processing a dependent phi instruction. Change-Id: Id4949719f8298062fe476a25718acccc109113b6 Reviewers: llvm-commits Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D60999 llvm-svn: 358983
* [CallSite removal] move InlineCost to CallBase usageFedor Sergeev2019-04-231-2/+3
| | | | | | | | | | | Converting InlineCost interface and its internals into CallBase usage. Inliners themselves are still not converted. Reviewed By: reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D60636 llvm-svn: 358982
* [ARM] Update check for CBZ in IfcvtDavid Green2019-04-233-43/+59
| | | | | | | | | | | The check for creating CBZ in constant island pass recently obtained the ability to search backwards to find a Cmp instruction. The code in IfCvt should mirror this to allow more conversions to the smaller form. The common code has been pulled out into a separate function to be shared between the two places. Differential Revision: https://reviews.llvm.org/D60090 llvm-svn: 358977
* [ARM] Don't replicate instructions in Ifcvt at minsizeDavid Green2019-04-231-0/+9
| | | | | | | | | | Ifcvt can replicate instructions as it converts them to be predicated. This stops that from happening on thumb2 targets at minsize where an extra IT instruction is likely needed. Differential Revision: https://reviews.llvm.org/D60089 llvm-svn: 358974
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2019-04-231-2/+2
| | | | llvm-svn: 358969
* [AArch64] Add support for MTE intrinsicsJaved Absar2019-04-234-22/+78
| | | | | | | | | | | This patch provides intrinsics support for Memory Tagging Extension (MTE), which was introduced with the Armv8.5-a architecture. The intrinsics are described in detail in the latest ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest Reviewed by: David Spickett Differential Revision: https://reviews.llvm.org/D60486 llvm-svn: 358963
* [ARM][FIX] Add missing f16.lane.vldN/vstN loweringDiogo N. Sampaio2019-04-231-0/+2
| | | | | | | | | | | | | | | | | | Summary: Add missing D and Q lane VLDSTLane lowering for fp16 elements. Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60874 llvm-svn: 358962
* [WebAssembly] Bail out of fastisel earlier when computing PIC addressesSam Clegg2019-04-231-11/+6
| | | | | | | | | | | This change partially reverts https://reviews.llvm.org/D54647 in favor of bailing out during computeAddress instead. This catches the condition earlier and handles more cases. Differential Revision: https://reviews.llvm.org/D60986 llvm-svn: 358948
* [SelectionDAG] move splat util functions up from x86 loweringSanjay Patel2019-04-221-56/+1
| | | | | | | | | | This was supposed to be NFC, but the change in SDLoc definitions causes instruction scheduling changes. There's nothing x86-specific in this code, and it can likely be used from DAGCombiner's simplifyVBinOp(). llvm-svn: 358930
* [AMDGPU] Fix an issue in `op_sel_hi` skipping.Michael Liao2019-04-221-7/+16
| | | | | | | | | | | | | | | | | Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922
* AMDGPU: Skip debug instructions in assertMatt Arsenault2019-04-221-2/+7
| | | | | | | | | | These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909
* AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sourcesMatt Arsenault2019-04-221-1/+3
| | | | llvm-svn: 358894
* AMDGPU: Fix not checking for copy when looking at copy srcMatt Arsenault2019-04-221-1/+6
| | | | | | | Effectively reverts r356956. The check for isFullCopy was excessive, but there still needs to be a check that this is a copy. llvm-svn: 358890
* [AMDGPU][MC] Corrected parsing of SP3 'neg' modifierDmitry Preobrazhensky2019-04-221-24/+58
| | | | | | | | | | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60624 llvm-svn: 358888
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-221-11/+25
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [X86] Reject 512-bit types in getRegForInlineAsmConstraint when AVX512 is ↵Craig Topper2019-04-221-2/+5
| | | | | | not enabled. Same for 256 bit and AVX. llvm-svn: 358872
* [ARM] Rewrite isLegalT2AddressImmediateDavid Green2019-04-211-29/+24
| | | | | | | | | | | | | | | | | | | | | This does two main things, firstly adding some at least basic addressing modes for i64 types, and secondly treats floats and doubles sensibly when there is no fpu. The floating point change can help codesize in some cases, especially with D60294. Most backends seems to not consider the exact VT in isLegalAddressingMode, instead switching on type size. That is now what this does when the target does not have an fpu (as the float data will be loaded using LDR's). i64's currently use the address range of an LDRD (even though they may be legalised and loaded with an LDR). This is at least better than marking them all as illegal addressing modes. I have not attempted to do much with vectors yet. That will need changing once MVE is added. Differential Revision: https://reviews.llvm.org/D60677 llvm-svn: 358845
* [X86] Add the rounding control operand to the printing for some scalar FMA ↵Craig Topper2019-04-211-1/+1
| | | | | | instructions. llvm-svn: 358844
* [X86] Don't form masked vfpclass instruction from and+vfpclass unless the ↵Craig Topper2019-04-211-28/+36
| | | | | | fpclass only has a single use. llvm-svn: 358841
* Revert r358800. Breaks Obsequi from the test suite.Amara Emerson2019-04-201-5/+8
| | | | | | | The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with a different issue. llvm-svn: 358829
* [X86] Disable argument copy elision for arguments passed via pointersCraig Topper2019-04-201-1/+5
| | | | | | | | | | | | | | | | | | | | | Summary: If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory. The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too. This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix. Reviewers: rnk Reviewed By: rnk Subscribers: dbabokin, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60801 llvm-svn: 358817
* [X86] Fix stack probing on x32 (PR41477)Nikita Popov2019-04-202-8/+15
| | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI with stack probing a dynamic alloca will result in a WIN_ALLOCA_32 with a 32-bit size. The current implementation tries to copy it into RAX, resulting in a physreg copy error. Fix this by copying to EAX instead. Also fix incorrect opcodes or registers used in subs. llvm-svn: 358807
* [X86] Don't turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if ↵Craig Topper2019-04-201-18/+37
| | | | | | | | | | the original AND can represented by MOVZX. The MOVZX doesn't require an immediate to be encoded at all. Though it does use a 2 byte opcode so its the same size as a 1 byte immediate. But it has a separate source and dest register so can help avoid copies. llvm-svn: 358805
* [X86] Turn (and (anyextend (shl X, C1), C2)) into (shl (and (anyextend X), ↵Craig Topper2019-04-201-9/+29
| | | | | | | | | (C1 >> C2), C2) if the AND could match a movzx. There's one slight regression in here because we don't check that the immediate already allowed movzx before the shift. I'll fix that next. llvm-svn: 358804
* Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads ↵Amara Emerson2019-04-191-8/+5
| | | | | | | | | and stores"" We were shifting the wrong component of a split load when trying to combine them back into a single value. llvm-svn: 358800
* [GlobalISel][AArch64] Legalize + select G_FRINTJessica Paquette2019-04-192-1/+2
| | | | | | | | | | Exactly the same as G_FCEIL, G_FABS, etc. Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc. Differential Revision: https://reviews.llvm.org/D60895 llvm-svn: 358799
OpenPOWER on IntegriCloud