summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* Use VR128X instead of FR32X/FR64X for the register class in ↵Craig Topper2019-06-171-5/+5
| | | | | | | | VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630
* [X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what ↵Craig Topper2019-06-171-1/+2
| | | | | | | | types are allowed here. NFC Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code. llvm-svn: 363629
* [AMDGPU] Use custom inserter for gfx10 VOP2bStanislav Mekhanoshin2019-06-171-1/+3
| | | | | | | | | This is part of the approved D63204 pending parent revision. This small change is in fact a part of the VOP2b legalization which does not technically belong to wave32 support, so extracted separately. llvm-svn: 363625
* [AMDGPU] Propagate function attributes thru bitcastsStanislav Mekhanoshin2019-06-171-3/+4
| | | | | | | | | AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614
* AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printerNicolai Haehnle2019-06-171-1/+7
| | | | | | | | | | | | | | | | | | | | | | Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602
* [GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do soJessica Paquette2019-06-171-16/+144
| | | | | | | | | | | | | | | | | | | | | | | | | Basically porting over the behaviour in AArch64ISelLowering to GISel. See emitComparison for reference. When we have something like this: ``` lhs = G_SUB 0, y ... G_ICMP lhs, rhs ``` We can fold away the G_SUB and produce a cmn instead, given that we produce the same value in NZCV. Add a test showing that the transformation works, and also showing that we don't perform the transformation when it's unsafe. Also factor out the CSet emission into emitCSetForICMP. Differential Revision: https://reviews.llvm.org/D63163 llvm-svn: 363596
* [X86] Add TB_NO_REVERSE to some memory folding table entries where the ↵Craig Topper2019-06-171-3/+3
| | | | | | | | | | | | | register form requires 64-bit mode, but the memory form does not. We don't know if its safe to unfold if we're in 32-bit mode. This is simlar to what was done to some load opcodes in r363523. I think its pretty unlikely we will try to unfold these anyway so I don't think this is testable. llvm-svn: 363595
* [X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026)Simon Pilgrim2019-06-171-0/+45
| | | | | | If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592
* [AMDGPU] gfx1010 wavefrontsize intrinsic foldingStanislav Mekhanoshin2019-06-173-16/+59
| | | | | | Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588
* [AMDGPU] Pass to propagate ABI attributes from kernels to the functionsStanislav Mekhanoshin2019-06-174-4/+356
| | | | | | | | | | | | | | | | The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586
* [X86][AVX] Split under-aligned vector nt-stores.Simon Pilgrim2019-06-171-2/+13
| | | | | | If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582
* [LV] Suppress vectorization in some nontemporal casesWarren Ristow2019-06-172-0/+37
| | | | | | | | | | | | | | | | | | | | | When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type *DataType, unsigned Alignment) const; bool isLegalNTLoad(Type *DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581
* GlobalISel: Verify intrinsicsMatt Arsenault2019-06-171-26/+34
| | | | | | | | I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579
* AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic IDMatt Arsenault2019-06-171-2/+1
| | | | llvm-svn: 363578
* [AMDGPU] gfx1010 wave32 metadataStanislav Mekhanoshin2019-06-178-2/+85
| | | | | | Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577
* AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECTTom Stellard2019-06-173-1/+195
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576
* [X86] combineLoad - begun making the load split code more generic. NFCI.Simon Pilgrim2019-06-171-13/+12
| | | | | | | | This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment. This is necessary for an upcoming patch to split under-aligned non-temporal vector loads. llvm-svn: 363570
* [X86][SSE] Prevent misaligned non-temporal vector load/store combinesSimon Pilgrim2019-06-171-4/+13
| | | | | | | | | | For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564
* AMDGPU: Ignore subtarget for InferAddressSpacesMatt Arsenault2019-06-171-2/+1
| | | | | | | Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561
* AMDGPU/GlobalISel: Fix default mapping for non-register operandsMatt Arsenault2019-06-171-1/+5
| | | | | | Tests will be in future commits when new intrinsics are handled here. llvm-svn: 363559
* AMDGPU: Cleanup custom PseudoSourceValue definitionsMatt Arsenault2019-06-171-16/+23
| | | | | | | Use separate enums for each kind, avoid repeating overloads, and add missing classof implementation. llvm-svn: 363558
* [CodeGen] Check for HardwareLoop Latch ExitBlockSam Parker2019-06-171-4/+0
| | | | | | | | | | | | The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556
* [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splittingLuis Marques2019-06-171-0/+1
| | | | | | | | | | | | | | | Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544
* Fix clang -Wcovered-switch-default after stack-id change by D60137Fangrui Song2019-06-171-8/+7
| | | | llvm-svn: 363543
* [ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds ↵Fangrui Song2019-06-171-1/+1
| | | | | | after D63265 llvm-svn: 363535
* [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265Fangrui Song2019-06-171-1/+1
| | | | llvm-svn: 363534
* Describe stack-id as an enumSander de Smalen2019-06-174-10/+16
| | | | | | | | | | | | | | | | | This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533
* [ARM] Remove ARMComputeBlockSizeSam Parker2019-06-171-80/+0
| | | | | | Forgot to remove file! llvm-svn: 363532
* [ARM] Add ARMBasicBlockInfo.cppSam Parker2019-06-171-0/+146
| | | | | | Forgot to add file! llvm-svn: 363531
* [ARM] Extract some code from ARMConstantIslandPassSam Parker2019-06-174-112/+101
| | | | | | | | | Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530
* PowerPC: Optimize SPE double parameter calling setupJustin Hibbits2019-06-176-49/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526
* [X86] Add TB_NO_REVERSE to some folding table entries where the register ↵Craig Topper2019-06-161-9/+9
| | | | | | | | | | | | | | from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523
* AMDGPU: Prepare for explicit absolute relocations in code generationNicolai Haehnle2019-06-164-8/+24
| | | | | | | | | | | | | | | | | Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517
* AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0Nicolai Haehnle2019-06-163-10/+19
| | | | | | | | | | | | | | | | | | | | | Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516
* AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsicNicolai Haehnle2019-06-164-13/+22
| | | | | | | | | | | | | | | Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-1618-211/+599
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* [x86] split 256-bit vector selects if operands are vector concatsSanjay Patel2019-06-161-0/+36
| | | | | | | | | | | | | | | | | | | This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508
* [X86] CombineShuffleWithExtract - handle cases with different vector extract ↵Simon Pilgrim2019-06-161-6/+28
| | | | | | | | sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507
* [X86] CombineShuffleWithExtract - assert all src ops types are multiples of ↵Simon Pilgrim2019-06-151-1/+2
| | | | | | rootsize. NFCI. llvm-svn: 363501
* [X86][AVX] Handle lane-crossing ↵Simon Pilgrim2019-06-151-49/+79
| | | | | | | | | | shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500
* [X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)Simon Pilgrim2019-06-151-0/+23
| | | | | | This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499
* [PowerPC] Set the innermost hot loop to align 32 bytesKang Zhang2019-06-151-0/+12
| | | | | | | | | | | | | | | | | Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495
* [RISCV] Simplify RISCVAsmBackend::writeNopData(). NFCFangrui Song2019-06-151-7/+3
| | | | llvm-svn: 363486
* Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Matt Arsenault2019-06-151-3/+71
| | | | | | | This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478
* Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Mitch Phillips2019-06-141-71/+3
| | | | | | | | | | | This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929. This reverts rL363410. llvm-svn: 363476
* AMDGPU: Avoid most waitcnts before callsMatt Arsenault2019-06-141-66/+90
| | | | | | | | | | | Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465
* [PowerPC][NFC] Comments update and remove some unused defJinsong Ji2019-06-142-20/+2
| | | | llvm-svn: 363461
* AMDGPU: Fix dropping memref for ds append/consumeMatt Arsenault2019-06-141-1/+3
| | | | | | | | | The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455
* AMDGPU: Set isTrap on S_TRAPMatt Arsenault2019-06-141-1/+4
| | | | | | | This seems to only be used for generating some kind of documentation, but might as well set it. llvm-svn: 363454
* [PowerPC][NFC] Format comments in P9InstrResrouce.tdJinsong Ji2019-06-141-80/+77
| | | | llvm-svn: 363423
OpenPOWER on IntegriCloud