summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Select G_PTRTOINTMatt Arsenault2019-10-041-0/+1
| | | | llvm-svn: 373715
* AMDGPU/GlobalISel: Support wave32 waterfall loopsMatt Arsenault2019-10-041-22/+30
| | | | llvm-svn: 373714
* [X86] Enable inline memcmp() to use AVX512David Zarzycki2019-10-041-2/+1
| | | | llvm-svn: 373706
* [AMDGPU][SILoadStoreOptimizer] NFC: Refactor codePiotr Sobczak2019-10-041-120/+80
| | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes. Introducing InstSubclass will keep them separate and help extending InstClassEnum with other instruction types (e.g. MIMG) in the future. This patch also makes getSubRegIdxs() more concise. Reviewers: nhaehnle, arsenm, tstellar Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68384 llvm-svn: 373699
* [RISCV] Split SP adjustment to reduce the offset of callee saved register ↵Shiva Chen2019-10-042-1/+90
| | | | | | | | | | | | | | | | | | | | spill and restore We would like to split the SP adjustment to reduce the instructions in prologue and epilogue as the following case. In this way, the offset of the callee saved register could fit in a single store. add sp,sp,-2032 sw ra,2028(sp) sw s0,2024(sp) sw s1,2020(sp) sw s3,2012(sp) sw s4,2008(sp) add sp,sp,-64 Differential Revision: https://reviews.llvm.org/D68011 llvm-svn: 373688
* [AArch64InstPrinter] prefer bfi to bfc for < armv8.2-aNick Desaulniers2019-10-031-1/+2
| | | | | | | | | | | | | | | | | | | Summary: Fixes pr/42576. Link: https://github.com/ClangBuiltLinux/linux/issues/697 Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D68356 llvm-svn: 373655
* [PowerPC] Adjust the naming and operand order of fnmsub patternsJinsong Ji2019-10-031-18/+18
| | | | | | | | | | | | | | | | | | | | | | Summary: This is follow up patch of https://reviews.llvm.org/D67595. Adjust naming and the Commutable operands for additional patterns to make it easier to read. The testcase update also show that we can save some unecessary fmr as well. Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai Reviewed By: #powerpc, nemanjai Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68112 llvm-svn: 373652
* [NFC] Fix unused variable in release buildsJordan Rupprecht2019-10-031-1/+2
| | | | llvm-svn: 373646
* [X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors ↵Craig Topper2019-10-031-0/+44
| | | | | | | | | | | | | truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes. This patch recognizes the shuffle pattern we get from a v8i64->v8i8 truncate when v8i64 isn't a legal type. With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector. Diffrential Revision: https://reviews.llvm.org/D68374 llvm-svn: 373645
* [X86] matchShuffleWithSHUFPD - use Zeroable element mask directly. NFCI.Simon Pilgrim2019-10-031-7/+7
| | | | | | | | | | We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly. This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner. This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask. llvm-svn: 373641
* AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELTMatt Arsenault2019-10-031-5/+77
| | | | llvm-svn: 373639
* AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelectMatt Arsenault2019-10-032-167/+271
| | | | | | | | Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638
* AMDGPU/GlobalISel: Allow VGPR to index SGPR registerMatt Arsenault2019-10-031-4/+6
| | | | | | | | We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637
* AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 andMatt Arsenault2019-10-031-2/+3
| | | | | | This would try to do FewerElements to v9s8 llvm-svn: 373635
* AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructionsTom Stellard2019-10-031-82/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds a pre-pass to this optimization that scans through the basic block and generates lists of mergeable instructions with one list per unique address. In the optimization phase instead of scanning through the basic block for mergeable instructions, we now iterate over the lists generated by the pre-pass. The decision to re-optimize a block is now made per list, so if we fail to merge any instructions with the same address, then we do not attempt to optimize them in future passes over the block. This will help to reduce the time this pass spends re-optimizing instructions. In one pathological test case, this change reduces the time spent in the SILoadStoreOptimizer from 0.2s to 0.03s. This restructuring will also make it possible to implement further solutions in this pass, because we can now add less expensive checks to the pre-pass and filter instructions out early which will avoid the need to do the expensive scanning during the optimization pass. For example, checking for adjacent offsets is an inexpensive test we can move to the pre-pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65961 llvm-svn: 373630
* [BPF] Handle offset reloc endpoint ending in the middle of chain properlyYonghong Song2019-10-031-118/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During studying support for bitfield, I found an issue for an example like the one in test offset-reloc-middle-chain.ll. struct t1 { int c; }; struct s1 { struct t1 b; }; struct r1 { struct s1 a; }; #define _(x) __builtin_preserve_access_index(x) void test1(void *p1, void *p2, void *p3); void test(struct r1 *arg) { struct s1 *ps = _(&arg->a); struct t1 *pt = _(&arg->a.b); int *pi = _(&arg->a.b.c); test1(ps, pt, pi); } The IR looks like: %0 = llvm.preserve.struct.access(base, ...) %1 = llvm.preserve.struct.access(%0, ...) %2 = llvm.preserve.struct.access(%1, ...) using %0, %1 and %2 In this case, we need to generate three relocatiions corresponding to chains: (%0), (%0, %1) and (%0, %1, %2). After collecting all the chains, the current implementation process each chain (in a map) with code generation sequentially. For example, after (%0) is processed, the code may look like: %0 = base + special_global_variable // llvm.preserve.struct.access(base, ...) is delisted // from the instruction stream. %1 = llvm.preserve.struct.access(%0, ...) %2 = llvm.preserve.struct.access(%1, ...) using %0, %1 and %2 When processing chain (%0, %1), the current implementation tries to visit intrinsic llvm.preserve.struct.access(base, ...) to get some of its properties and this caused segfault. This patch fixed the issue by remembering all necessary information (kind, metadata, access_index, base) during analysis phase, so in code generation phase there is no need to examine the intrinsic call instructions. This also simplifies the code. Differential Revision: https://reviews.llvm.org/D68389 llvm-svn: 373621
* Revert "[Alignment][NFC] Allow constexpr Align"Guillaume Chatelet2019-10-031-1/+1
| | | | | | This reverts commit b3af236fb5fc6e50fcc1b54d868f0bff557f3fb1. llvm-svn: 373619
* [RISCV] Add obsolete aliases of fscsr, frcsr (fssr, frsr)Edward Jones2019-10-031-0/+6
| | | | | | | | These old aliases were renamed, but are still used by some projects (eg newlib). Differential Revision: https://reviews.llvm.org/D68392 llvm-svn: 373618
* [AArch64][SVE] Adding patterns for floating point SVE add instructions.Ehsan Amiri2019-10-032-12/+14
| | | | llvm-svn: 373600
* [mips] Push `fixup_Mips_LO16` fixup for `jialc` and `jic` instructionsSimon Atanasyan2019-10-031-2/+5
| | | | llvm-svn: 373591
* [AArch64] Static (de)allocation of SVE stack objects.Sander de Smalen2019-10-036-13/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects. The focus of this patch is purely to allow the stack frame to allocate/deallocate space for scalable SVE objects. More dynamic allocation (at compile-time, i.e. determining placement of SVE objects on the stack), or resolving frame-index references that include scalable-sized offsets, are left for subsequent patches. SVE objects are allocated in the stack frame as a separate region below the callee-save area, and above the alignment gap. This is done so that the SVE objects can be accessed directly from the FP at (runtime) VL-based offsets to benefit from using the VL-scaled addressing modes. The layout looks as follows: +-------------+ | stack arg | +-------------+ | Callee Saves| | X29, X30 | (if available) |-------------| <- FP (if available) | : | | SVE area | | : | +-------------+ |/////////////| alignment gap. | : | | Stack objs | | : | +-------------+ <- SP after call and frame-setup SVE and non-SVE stack objects are distinguished using different StackIDs. The offsets for objects with TargetStackID::SVEVector should be interpreted as purely scalable offsets within their respective SVE region. Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61437 llvm-svn: 373585
* Fix uninitialized variable warning. NFCISimon Pilgrim2019-10-031-1/+1
| | | | llvm-svn: 373583
* Fix uninitialized variable warning. NFCISimon Pilgrim2019-10-031-1/+1
| | | | llvm-svn: 373582
* [Alignment][NFC] Allow constexpr AlignGuillaume Chatelet2019-10-031-1/+1
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68329 llvm-svn: 373580
* AMDGPU/GlobalISel: Don't re-get subtargetMatt Arsenault2019-10-031-6/+3
| | | | | | It's already available in the class. llvm-svn: 373568
* AMDGPU/GlobalISel: Expand G_BITCAST legalityMatt Arsenault2019-10-031-4/+1
| | | | llvm-svn: 373567
* [X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a ↵Craig Topper2019-10-032-103/+17
| | | | | | | | | | | | | | | | vbroadcast_load if the scalar size is the same. This improves broadcast load folding of i64 elements on 32-bit targets where i64 isn't legal. Previously we had to represent these as vXf64 vbroadcast_loads and a bitcast to vXi64. But we didn't have any isel patterns looking for that. This also allows us to remove or simplify some isel patterns that were looking for bitcasted vbroadcast_loads. llvm-svn: 373566
* [X86] Add broadcast load folding patterns to NoVLX ↵Craig Topper2019-10-031-7/+31
| | | | | | | | VPMULLQ/VPMAXSQ/VPMAXUQ/VPMINSQ/VPMINUQ patterns. More fixes for PR36191. llvm-svn: 373560
* [X86] Remove a couple redundant isel patterns that look to have been ↵Craig Topper2019-10-031-17/+0
| | | | | | copy/pasted from right above them. NFC llvm-svn: 373559
* [gicombiner] Fix windows issue where single quotes in the command are passed ↵Daniel Sanders2019-10-021-1/+1
| | | | | | through to tablegen llvm-svn: 373545
* [AMDGPU] Fix illegal agpr use by VALUStanislav Mekhanoshin2019-10-021-1/+10
| | | | | | | | | | | | | | | | | | | When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544
* [gicombiner] Add the boring boilerplate for the declarative combinerDaniel Sanders2019-10-024-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the first of a series of patches extracted from a much bigger WIP patch. It merely establishes the tblgen pass and the way empty combiner helpers are declared and integrated into a combiner info. The tablegen pass takes a -combiners option to select the combiner helper that will be generated. This can be given multiple values to generate multiple combiner helpers at once. Doing so helps to minimize parsing overhead. The reason for creating a GlobalISel subdirectory in utils/TableGen is that there will be quite a lot of non-pass files (~15) by the time the patch series is done. Reviewers: volkan Subscribers: mgorny, hiraditya, simoncook, Petar.Avramovic, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68286 llvm-svn: 373527
* [PowerPC] Fix SH field overflow issueYi-Hong Lyu2019-10-021-4/+8
| | | | | | | | | | Store rlwinm Rx, Ry, 32, 0, 31 as rlwinm Rx, Ry, 0, 0, 31 and store rldicl Rx, Ry, 64, 0 as rldicl Rx, Ry, 0, 0. Otherwise SH field is overflow and fails assertion in assembly printing stage. Differential Revision: https://reviews.llvm.org/D66991 llvm-svn: 373519
* [ARM] Make helpers static. NFC.Benjamin Kramer2019-10-021-3/+5
| | | | llvm-svn: 373503
* [X86] Rewrite to the vXi1 subvector insertion code to not rely on the value ↵Craig Topper2019-10-021-14/+26
| | | | | | | | | | | | | | of bits that might be undef The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified. Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening. This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR. Differential Revision: https://reviews.llvm.org/D68311 llvm-svn: 373495
* [WebAssembly] Error when using wasm64 for ISelThomas Lively2019-10-021-0/+6
| | | | | | | | | | | | | | | | | | | | Summary: 64-bit WebAssembly (wasm64) is not specified and not supported in the WebAssembly backend. We do have support for it in clang, however, and we would like to keep that support because we expect wasm64 to be specified and supported in the future. For now add an error when trying to use wasm64 from the backend to minimize user confusion from unexplained crashes. Reviewers: aheejin, dschuff, sunfish Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68254 llvm-svn: 373493
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-0214-155/+308
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* [AArch64][SVE] Implement int_aarch64_sve_cnt intrinsicKerry McLaughlin2019-10-022-6/+16
| | | | | | | | | | | | | | | | Summary: This patch includes tests for the VecOfBitcastsToInt type added by D68021 Reviewers: c-rhodes, sdesmalen, rovka Reviewed By: c-rhodes Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68023 llvm-svn: 373468
* [ARM] Identity shuffles are legalDavid Green2019-10-021-0/+1
| | | | | | | | | | | | | | | Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446
* [AMDGPU] Make printf lowering faster when there are no printfsJay Foad2019-10-021-16/+14
| | | | | | | | | | | | | | | | | Summary: Printf lowering unconditionally visited every instruction in the module. To make it faster in the common case where there are no printfs, look up the printf function (if any) and iterate over its users instead. Reviewers: rampitec, kzhuravl, alex-t, arsenm Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68145 llvm-svn: 373433
* [X86] Add broadcast load folding patterns to the NoVLX compare patterns.Craig Topper2019-10-021-16/+138
| | | | | | | | | These patterns use zmm registers for 128/256-bit compares when the VLX instructions aren't available. Previously we only supported registers, but as PR36191 notes we can fold broadcast loads, but not regular loads. llvm-svn: 373423
* AMDGPU/GlobalISel: Use getIntrinsicID helperMatt Arsenault2019-10-023-7/+7
| | | | llvm-svn: 373417
* AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEXMatt Arsenault2019-10-021-1/+7
| | | | | | | | | In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
* AMDGPU/GlobalISel: Private loads always use VGPRsMatt Arsenault2019-10-021-4/+6
| | | | llvm-svn: 373414
* AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTORMatt Arsenault2019-10-021-4/+6
| | | | | | This will be needed to support AGPR operations. llvm-svn: 373413
* AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit valuesMatt Arsenault2019-10-021-29/+35
| | | | llvm-svn: 373412
* [AMDGPU] separate accounting for agprsStanislav Mekhanoshin2019-10-023-7/+52
| | | | | | | | | Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411
* [X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are ↵Craig Topper2019-10-011-0/+34
| | | | | | | | | | | | constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408
* AMDGPU: Fix an out of date assert in addressing FrameIndexChangpeng Fang2019-10-011-3/+2
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
* Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ↵Craig Topper2019-10-011-79/+1
| | | | | | | | | | | | | | | | ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401
OpenPOWER on IntegriCloud