summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Resolve KnownUndef/KnownZero bits into target shuffle masks in helper. ↵Simon Pilgrim2019-10-151-9/+18
| | | | | | NFCI. llvm-svn: 374878
* [MIPS GlobalISel] Add MSA registers to fprb. Select vector load, storePetar Avramovic2019-10-154-26/+103
| | | | | | | | | | | | | | | | | | Add vector MSA register classes to fprb, they are 128 bit wide. MSA instructions use the same registers for both integer and floating point operations. Therefore we only need to check for vector element size during legalization or instruction selection. Add helper function in MipsLegalizerInfo and switch to legalIf LegalizeRuleSet to keep legalization rules compact since they depend on MipsSubtarget and presence of MSA. fprb is assigned to all vector operands. Move selectLoadStoreOpCode to MipsInstructionSelector in order to reduce number of arguments. Differential Revision: https://reviews.llvm.org/D68867 llvm-svn: 374872
* [MIPS GlobalISel] Refactor MipsRegisterBankInfo [NFC]Petar Avramovic2019-10-152-153/+127
| | | | | | | | | | Check if size of operand LLT matches sizes of available register banks before inspecting the opcode in order to reduce number of checks. Factor commonly used pieces of code into functions. Differential Revision: https://reviews.llvm.org/D68866 llvm-svn: 374870
* [X86] Don't check for VBROADCAST_LOAD being a user of the source of a ↵Craig Topper2019-10-151-3/+1
| | | | | | | | | | | | | | | | | | VBROADCAST when trying to share broadcasts. The only things VBROADCAST_LOAD uses is an address and a chain node. It has no vector inputs. So if its a user of the source of another broadcast that could only mean one of two things. The other broadcast is broadcasting the address of the broadcast_load. Or the source is a load and the use we're seeing is the chain result from that load. Neither of these cases make sense to combine here. This issue was reported post-commit r373871. Test case has not been reduced yet. llvm-svn: 374862
* [RISCV] Support fast calling conventionShiva Chen2019-10-151-2/+67
| | | | | | | | | | | | LLVM may annotate the function with fastcc if there has only one caller and there're no other caller out of the module and the function is not naked or contain variable arguments. The fastcc functions could pass the arguments by the caller saved registers. Differential Revision: https://reviews.llvm.org/D68559 llvm-svn: 374857
* [WebAssembly] Trapping fptoint builtins and intrinsicsThomas Lively2019-10-151-0/+17
| | | | | | | | | | | | | | | | | | | | | Summary: The WebAssembly backend lowers fptoint instructions to a code sequence that checks for overflow to avoid traps because fptoint is supposed to be speculatable. These new builtins and intrinsics give users a way to depend on the trapping semantics of the underlying instructions and avoid the extra code generated normally. Patch by coffee and tlively. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68902 llvm-svn: 374856
* [X86] Teach X86MCodeEmitter to properly encode zmm16-zmm31 as index register ↵Craig Topper2019-10-141-0/+3
| | | | | | | | | | | | to vgatherpf/vscatterpf. We need to encode bit 4 into the EVEX.V' bit. We do this right for regular gather/scatter which use either MRMSrcMem or MRMDestMem formats. The prefetches use MRM*m formats. Fixes an issue recently added to PR36202. llvm-svn: 374849
* [ARM][AsmParser] handles offset expression in parenthesesJian Cai2019-10-141-5/+7
| | | | | | | | | | | | | | | Summary: Integrated assembler does not accept offset expressions surrounded by parenthesis. Handle this case for GAS compability. https://bugs.llvm.org/show_bug.cgi?id=43631 Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68764 llvm-svn: 374832
* AMDGPU: Fix redundant setting of m0 for atomic load/storeMatt Arsenault2019-10-141-10/+7
| | | | | | | Atomic load/store would have their setting of m0 handled twice, which happened to be optimized out later. llvm-svn: 374801
* [NVPTX] Restructure shfl instrinsics and add variants that return a predicate.Artem Belevich2019-10-142-111/+63
| | | | | | | | | Also, amend constraints for non-sync variants that are no longer available on sm_70+ with PTX6.4+. Differential Revision: https://reviews.llvm.org/D68892 llvm-svn: 374790
* [CostModel][X86] Add CTLZ scalar costsSimon Pilgrim2019-10-141-1/+22
| | | | | | | | Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
* [ARM] Selection for MVE VMOVNDavid Green2019-10-143-0/+48
| | | | | | | | | | | | | | The adds both VMOVNt and VMOVNb instruction selection from the appropriate shuffles. We detect shuffle masks of the form: 0, N, 2, N+2, 4, N+4, ... or 0, N+1, 2, N+3, 4, N+5, ... ISel will also try the opposite patterns, with inputs reversed. These are selected to VMOVNt and VMOVNb respectively. Differential Revision: https://reviews.llvm.org/D68283 llvm-svn: 374781
* [CostModel][X86] Add CTPOP scalar costs (PR43656)Simon Pilgrim2019-10-141-0/+23
| | | | | | | | Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
* [AArch64] Stackframe accesses to SVE objects.Sander de Smalen2019-10-144-41/+108
| | | | | | | | | | | | | | | | | Materialize accesses to SVE frame objects from SP or FP, whichever is available and beneficial. This patch still assumes the objects are pre-allocated. The automatic layout of SVE objects within the stackframe will be added in a separate patch. Reviewers: greened, cameron.mcinally, efriedma, rengolin, thegameg, rovka Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D67749 llvm-svn: 374772
* [AMDGPU] Come back patch for the 'Assign register class for cross block ↵Alexander Timofeev2019-10-144-125/+208
| | | | | | | | | | | | | | | | | | | | | | | | values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767
* [X86][BtVer2] Improved latency and throughput of float/vector loads and stores.Andrea Di Biagio2019-10-141-6/+6
| | | | | | | | | | | | | | | | | | | | This patch introduces the following changes to the btver2 scheduling model: - The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores). - Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy). - Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX). I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those are dispatched to the FPU but not really issues on JFPU01. Differential Revision: https://reviews.llvm.org/D68871 llvm-svn: 374765
* [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]Sam Parker2019-10-144-10/+13
| | | | | | | | | Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
* [X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag ↵Craig Topper2019-10-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | from the subtract directly during isel. This prevents isel from emitting a TEST instruction that optimizeCompareInstr will need to remove later. In some of the modified tests, the SUB gets duplicated due to the flags being needed in two places and being clobbered in between. optimizeCompareInstr was able to optimize away the TEST that was using the result of one of them, but optimizeCompareInstr doesn't know to turn SUB into CMP after removing the TEST. It only knows how to turn SUB into CMP if the result was already dead. With this change the TEST never exists, so optimizeCompareInstr doesn't have to remove it. Then it can just turn the SUB into CMP immediately. Fixes PR43649. llvm-svn: 374755
* [X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as ↵Simon Pilgrim2019-10-131-11/+11
| | | | | | | | well as KnownZero. We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all. llvm-svn: 374732
* [X86] Enable use of avx512 saturating truncate instructions in more cases.Craig Topper2019-10-131-35/+42
| | | | | | | | | | This enables use of the saturating truncate instructions when the result type is less than 128 bits. It also enables the use of saturating truncate instructions on KNL when the input is less than 512 bits. We can do this by widening the input and then extracting the result. llvm-svn: 374731
* [X86] SimplifyMultipleUseDemandedBitsForTargetNode - use ↵Simon Pilgrim2019-10-131-12/+11
| | | | | | getTargetShuffleInputs with KnownUndef/Zero results. llvm-svn: 374725
* [X86] getTargetShuffleInputs - add KnownUndef/Zero output supportSimon Pilgrim2019-10-131-25/+25
| | | | | | Adjust SimplifyDemandedVectorEltsForTargetNode to use the known elts masks instead of recomputing it locally. llvm-svn: 374724
* [X86] Add a one use check on the setcc to the min/max canonicalization code ↵Craig Topper2019-10-131-0/+1
| | | | | | | | | | | | | | | | | | | in combineSelect. This seems to improve std::midpoint code where we have a min and a max with the same condition. If we split the setcc we can end up with two compares if the one of the operands is a constant. Since we aggressively canonicalize compares with constants. For non-constants it can interfere with our ability to share control flow if we need to expand cmovs into control flow. I'm also not sure I understand this min/max canonicalization code. The motivating case talks about comparing with 0. But we don't check for 0 explicitly. Removes one instruction from the codegen for PR43658. llvm-svn: 374706
* [X86] Enable v4i32->v4i16 and v8i16->v8i8 saturating truncates to use pack ↵Craig Topper2019-10-131-0/+1
| | | | | | instructions with avx512. llvm-svn: 374705
* [X86] scaleShuffleMask - use size_t Scale to avoid overflow warningsSimon Pilgrim2019-10-122-7/+7
| | | | llvm-svn: 374674
* Replace for-loop of SmallVector::push_back with SmallVector::append. NFCI.Simon Pilgrim2019-10-121-4/+2
| | | | llvm-svn: 374669
* Fix cppcheck shadow variable name warnings. NFCI.Simon Pilgrim2019-10-121-6/+6
| | | | llvm-svn: 374668
* [X86] Use any_of/all_of patterns in shuffle mask pattern recognisers. NFCI.Simon Pilgrim2019-10-121-24/+13
| | | | llvm-svn: 374667
* [X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reductionSimon Pilgrim2019-10-121-7/+18
| | | | | | This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts. llvm-svn: 374658
* [CostModel][X86] Improve sum reduction costs.Simon Pilgrim2019-10-121-22/+23
| | | | | | | | I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655
* [X86] Use pack instructions for packus/ssat truncate patterns when 256-bit ↵Craig Topper2019-10-121-1/+4
| | | | | | | | | | | is the largest legal vector and the result type is at least 256 bits. Since the input type is larger than 256-bits we'll need to some concatenating to reassemble the results. The pack instructions ability to concatenate while packing make this a shorter/faster sequence. llvm-svn: 374643
* [mips] Rely on GPR size not ABI when select instruction to load value into ↵Simon Atanasyan2019-10-121-9/+5
| | | | | | register llvm-svn: 374641
* [mips] Fix `loadImmediate` calls when load non-address values.Simon Atanasyan2019-10-121-5/+5
| | | | llvm-svn: 374640
* NFC: clang-format rL374420 and adjust comment wordingHubert Tong2019-10-121-9/+11
| | | | | | | | | | | | | The commit of rL374420 had various formatting issues, including lines that exceed 80 columns. This patch applies `git clang-format` on the changes from commit 13bd3ef40d8b1586f26a022e01b21e56c91e05bd. It further adjusts a comment to clarify the domain of inputs upon which a newly added function is meant to operate. The adjustment to the comment was suggested in a post-commit comment on D68721 and discussed off-list with @sfertile. llvm-svn: 374635
* recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure ↵Zi Xuan Wu2019-10-1211-15/+54
| | | | | | | | | | | | | | | | | | | | | | | separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634
* [X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store.Craig Topper2019-10-121-13/+11
| | | | | | | | We already did this for VTRUNCUS with a specific combination of types. This extends this to VTRUNCS and handles any types where a truncating store is legal. llvm-svn: 374615
* [AMDGPU] link dpp pseudos and real instructions on gfx10Stanislav Mekhanoshin2019-10-113-34/+28
| | | | | | | | | | | | This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604
* [mips] Remove unused local variables. NFCSimon Atanasyan2019-10-111-19/+11
| | | | llvm-svn: 374599
* [mips] Store 64-bit `li.d' operand as a single 8-byte valueSimon Atanasyan2019-10-111-4/+4
| | | | | | | | | | | | | | | Now assembler generates two consecutive `.4byte` directives to store 64-bit `li.d' operand. The first directive stores high 4-byte of the value. The second directive stores low 4-byte of the value. But on 64-bit system we load this value at once and get wrong result if the system is little-endian. This patch fixes the bug. It stores the `li.d' operand as a single 8-byte value. Differential Revision: https://reviews.llvm.org/D68778 llvm-svn: 374598
* [mips] Use less instruction to load zero into FPR by li.s / li.d pseudosSimon Atanasyan2019-10-111-13/+18
| | | | | | | | | | If `li.s` or `li.d` loads zero into a FPR, it's not necessary to load zero into `at` GPR register and then move its value into a floating point register. We can use as a source register the `zero / $0` one. Differential Revision: https://reviews.llvm.org/D68777 llvm-svn: 374597
* [Mips][llvm-exegesis] Add a Mips targetSimon Atanasyan2019-10-113-0/+25
| | | | | | | | | | | The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes. Patch by Miloš Stojanović. Differential Revision: https://reviews.llvm.org/D68649 llvm-svn: 374590
* [X86][SSE] Add support for v4i8 add reductionSimon Pilgrim2019-10-111-2/+7
| | | | llvm-svn: 374579
* [AArch64][SVE] Implement sdot and udot (lane) intrinsicsKerry McLaughlin2019-10-113-22/+35
| | | | | | | | | | | | | | | | | | | | | Summary: Implements the following arithmetic intrinsics: - int_aarch64_sve_sdot - int_aarch64_sve_sdot_lane - int_aarch64_sve_udot - int_aarch64_sve_udot_lane This patch includes tests for the Subdivide4Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67551 llvm-svn: 374566
* [AIX] Use .space instead of .zero in assemblyDavid Tenty2019-10-111-0/+1
| | | | | | | | | | | | | | Summary: The AIX system assembler does not understand .zero, so we should prefer emitting .space. Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68815 llvm-svn: 374564
* [AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ↵Dmitry Preobrazhensky2019-10-111-5/+18
| | | | | | | | | | | | ds_[read/write]_addtid_b32 See https://bugs.llvm.org/show_bug.cgi?id=37941 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68787 llvm-svn: 374561
* [AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions ↵Dmitry Preobrazhensky2019-10-111-14/+29
| | | | | | | | | | | | buffer_atomic_[fcmpswap/fmin/fmax]* See https://bugs.llvm.org/show_bug.cgi?id=28232 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68788 llvm-svn: 374559
* [AMDGPU][MC][GFX10] Enabled null for 64-bit dst operandsDmitry Preobrazhensky2019-10-111-0/+12
| | | | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=43524 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68785 llvm-svn: 374557
* [AMDGPU][MC] Corrected parsing of optional operandsDmitry Preobrazhensky2019-10-111-12/+6
| | | | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=43486 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D68350 llvm-svn: 374553
* [mips] Fix loading "double" immediate into a GPR and FPRSimon Atanasyan2019-10-111-6/+14
| | | | | | | | | | | | | | | If a "double" (64-bit) value has zero low 32-bits, it's possible to load such value into a GP/FP registers as an instruction immediate. But now assembler loads only high 32-bits of the value. For example, if a target register is GPR the `li.d $4, 1.0` instruction converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000` in the register. While a correct representation of the `1.0` value is `0x3FF0000000000000`. The patch fixes that. Differential Revision: https://reviews.llvm.org/D68776 llvm-svn: 374544
* [X86] isFNEG - add recursion depth limitSimon Pilgrim2019-10-111-5/+9
| | | | | | Now that its used by isNegatibleForFree we should try to avoid costly deep recursion llvm-svn: 374534
OpenPOWER on IntegriCloud