bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs	Matt Arsenault	2019-10-10	8	-21/+25
\| \| \| \| \| \| \| \| \|	SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
*	AMDGPU: Don't fold copies to physregs	Matt Arsenault	2019-10-09	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \|	In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257
*	AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer	Matt Arsenault	2019-10-09	1	-4/+14
\| \| \| \| \| \| \| \| \| \|	This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255
*	AMDGPU: Relax register classes used	Matt Arsenault	2019-10-09	1	-2/+2
\| \| \| \|	llvm-svn: 374254
*	AMDGPU: Fix typos	Matt Arsenault	2019-10-09	1	-2/+2
\| \| \| \|	llvm-svn: 374253
*	GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR	Matt Arsenault	2019-10-09	1	-1/+10
\| \| \| \| \| \|	Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252
*	[AMDGPU] Fixed dpp combine of VOP1	Stanislav Mekhanoshin	2019-10-09	1	-0/+8
\| \| \| \| \| \| \| \| \|	If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241
*	[System Model] [TTI] Define AMDGPUTTIImpl::getST and AMDGPUTTIImpl::getTLI	Vitaly Buka	2019-10-09	1	-2/+10
\| \| \| \| \| \|	To fix "infinite recursion" warning. llvm-svn: 374222
*	[AMDGPU] Use math constants defined in MathExtras (NFC)	Evandro Menezes	2019-10-09	3	-45/+19
\| \| \| \| \| \| \| \|	Use the the new math constants in `MathExtras.h`. Differential revision: https://reviews.llvm.org/D68285 llvm-svn: 374208
*	AMDGPU: Fix i16 arithmetic pattern redundancy	Matt Arsenault	2019-10-08	1	-78/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
*	AMDGPU: Add offsets to MMO when lowering buffer intrinsics	Tom Stellard	2019-10-08	2	-11/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087
*	[AMDGPU] Disable unused gfx10 dpp instructions	Stanislav Mekhanoshin	2019-10-08	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	Inhibit generation of unused real dpp instructions on gfx10 just like it is done on other subtargets. This does not change anything because these are illegal anyway and not accepted, but it does reduce the number of instruction definitions generated. Differential Revision: https://reviews.llvm.org/D68607 llvm-svn: 374083
*	AMDGPU: Propagate undef flag during pre-RA exec mask optimizations	Nicolai Haehnle	2019-10-08	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68184 llvm-svn: 374041
*	AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources	Matt Arsenault	2019-10-07	1	-3/+6
\| \| \| \|	llvm-svn: 373989
*	AMDGPU/GlobalISel: Handle more G_INSERT cases	Matt Arsenault	2019-10-07	3	-57/+55
\| \| \| \| \| \| \| \| \|	Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947
*	GlobalISel: Partially implement lower for G_INSERT	Matt Arsenault	2019-10-07	1	-7/+3
\| \| \| \|	llvm-svn: 373946
*	AMDGPU/GlobalISel: Fix selection of 16-bit shifts	Matt Arsenault	2019-10-07	1	-3/+6
\| \| \| \|	llvm-svn: 373945
*	AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32	Matt Arsenault	2019-10-07	1	-1/+1
\| \| \| \|	llvm-svn: 373944
*	AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants	Matt Arsenault	2019-10-07	1	-20/+27
\| \| \| \| \| \| \|	This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943
*	AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources	Matt Arsenault	2019-10-07	1	-18/+29
\| \| \| \| \| \|	Continue making a mess of merge/unmerge legality. llvm-svn: 373942
*	AMDGPU/GlobalISel: Select more G_INSERT cases	Matt Arsenault	2019-10-07	1	-20/+78
\| \| \| \| \| \| \| \| \| \|	At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938
*	GlobalISel: Add target pre-isel instructions	Matt Arsenault	2019-10-07	5	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937
*	Second attempt to add iterator_range::empty()	Jordan Rose	2019-10-07	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Doing this makes MSVC complain that `empty(someRange)` could refer to either C++17's std::empty or LLVM's llvm::empty, which previously we avoided via SFINAE because std::empty is defined in terms of an empty member rather than begin and end. So, switch callers over to the new method as it is added. https://reviews.llvm.org/D68439 llvm-svn: 373935
*	AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsets	Matt Arsenault	2019-10-06	1	-2/+5
\| \| \| \|	llvm-svn: 373842
*	AMDGPU/GlobalISel: RegBankSelect mul24 intrinsics	Matt Arsenault	2019-10-06	1	-0/+2
\| \| \| \|	llvm-svn: 373841
*	AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics	Matt Arsenault	2019-10-06	1	-0/+35
\| \| \| \|	llvm-svn: 373840
*	AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS	Matt Arsenault	2019-10-06	1	-0/+3
\| \| \| \|	llvm-svn: 373839
*	GlobalISel: Partially implement lower for G_EXTRACT	Matt Arsenault	2019-10-06	1	-1/+13
\| \| \| \| \| \|	Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838
*	AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics	Matt Arsenault	2019-10-06	1	-6/+5
\| \| \| \| \| \|	This wasn't updated for the immarg handling change. llvm-svn: 373837
*	[NFC] Add { } to silence compiler warning [-Wmissing-braces].	Huihui Zhang	2019-10-04	1	-1/+1
\| \| \| \| \| \| \| \| \|	../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} llvm-svn: 373784
*	[AMDGPU][MC][GFX10][WS32] Corrected decoding of dst operand for v_cmp_*_sdwa ↵	Dmitry Preobrazhensky	2019-10-04	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	opcodes See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68349 llvm-svn: 373745
*	[AMDGPU][MC][GFX10] Enabled decoding of 'null' operand	Dmitry Preobrazhensky	2019-10-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68348 llvm-svn: 373740
*	[AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructions	Dmitry Preobrazhensky	2019-10-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68347 llvm-svn: 373736
*	AMDGPU/GlobalISel: Fix using wrong addrspace for aperture	Matt Arsenault	2019-10-04	1	-1/+3
\| \| \| \| \| \| \|	This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716
*	AMDGPU/GlobalISel: Select G_PTRTOINT	Matt Arsenault	2019-10-04	1	-0/+1
\| \| \| \|	llvm-svn: 373715
*	AMDGPU/GlobalISel: Support wave32 waterfall loops	Matt Arsenault	2019-10-04	1	-22/+30
\| \| \| \|	llvm-svn: 373714
*	[AMDGPU][SILoadStoreOptimizer] NFC: Refactor code	Piotr Sobczak	2019-10-04	1	-120/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes. Introducing InstSubclass will keep them separate and help extending InstClassEnum with other instruction types (e.g. MIMG) in the future. This patch also makes getSubRegIdxs() more concise. Reviewers: nhaehnle, arsenm, tstellar Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68384 llvm-svn: 373699
*	[NFC] Fix unused variable in release builds	Jordan Rupprecht	2019-10-03	1	-1/+2
\| \| \| \|	llvm-svn: 373646
*	AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT	Matt Arsenault	2019-10-03	1	-5/+77
\| \| \| \|	llvm-svn: 373639
*	AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect	Matt Arsenault	2019-10-03	2	-167/+271
\| \| \| \| \| \| \| \|	Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638
*	AMDGPU/GlobalISel: Allow VGPR to index SGPR register	Matt Arsenault	2019-10-03	1	-4/+6
\| \| \| \| \| \| \| \|	We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637
*	AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and	Matt Arsenault	2019-10-03	1	-2/+3
\| \| \| \| \| \|	This would try to do FewerElements to v9s8 llvm-svn: 373635
*	AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions	Tom Stellard	2019-10-03	1	-82/+185
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds a pre-pass to this optimization that scans through the basic block and generates lists of mergeable instructions with one list per unique address. In the optimization phase instead of scanning through the basic block for mergeable instructions, we now iterate over the lists generated by the pre-pass. The decision to re-optimize a block is now made per list, so if we fail to merge any instructions with the same address, then we do not attempt to optimize them in future passes over the block. This will help to reduce the time this pass spends re-optimizing instructions. In one pathological test case, this change reduces the time spent in the SILoadStoreOptimizer from 0.2s to 0.03s. This restructuring will also make it possible to implement further solutions in this pass, because we can now add less expensive checks to the pre-pass and filter instructions out early which will avoid the need to do the expensive scanning during the optimization pass. For example, checking for adjacent offsets is an inexpensive test we can move to the pre-pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65961 llvm-svn: 373630
*	[AArch64] Static (de)allocation of SVE stack objects.	Sander de Smalen	2019-10-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects. The focus of this patch is purely to allow the stack frame to allocate/deallocate space for scalable SVE objects. More dynamic allocation (at compile-time, i.e. determining placement of SVE objects on the stack), or resolving frame-index references that include scalable-sized offsets, are left for subsequent patches. SVE objects are allocated in the stack frame as a separate region below the callee-save area, and above the alignment gap. This is done so that the SVE objects can be accessed directly from the FP at (runtime) VL-based offsets to benefit from using the VL-scaled addressing modes. The layout looks as follows: +-------------+ \| stack arg \| +-------------+ \| Callee Saves\| \| X29, X30 \| (if available) \|-------------\| <- FP (if available) \| : \| \| SVE area \| \| : \| +-------------+ \|/////////////\| alignment gap. \| : \| \| Stack objs \| \| : \| +-------------+ <- SP after call and frame-setup SVE and non-SVE stack objects are distinguished using different StackIDs. The offsets for objects with TargetStackID::SVEVector should be interpreted as purely scalable offsets within their respective SVE region. Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61437 llvm-svn: 373585
*	AMDGPU/GlobalISel: Don't re-get subtarget	Matt Arsenault	2019-10-03	1	-6/+3
\| \| \| \| \| \|	It's already available in the class. llvm-svn: 373568
*	AMDGPU/GlobalISel: Expand G_BITCAST legality	Matt Arsenault	2019-10-03	1	-4/+1
\| \| \| \|	llvm-svn: 373567
*	[AMDGPU] Fix illegal agpr use by VALU	Stanislav Mekhanoshin	2019-10-02	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544
*	[AMDGPU] Extend buffer intrinsics with swizzling	Piotr Sobczak	2019-10-02	14	-155/+308
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
*	[AMDGPU] Make printf lowering faster when there are no printfs	Jay Foad	2019-10-02	1	-16/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Printf lowering unconditionally visited every instruction in the module. To make it faster in the common case where there are no printfs, look up the printf function (if any) and iterate over its users instead. Reviewers: rampitec, kzhuravl, alex-t, arsenm Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68145 llvm-svn: 373433
*	AMDGPU/GlobalISel: Use getIntrinsicID helper	Matt Arsenault	2019-10-02	3	-7/+7
\| \| \| \|	llvm-svn: 373417