bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU/GlobalISel: Fix regbankselect for uniform extloads	Matt Arsenault	2019-09-09	1	-4/+4
\| \| \| \| \| \|	There are no scalar extloads. llvm-svn: 371414
*	AMDGPU: Remove code address space predicates	Matt Arsenault	2019-09-09	3	-25/+57
\| \| \| \| \| \| \|	Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413
*	AMDGPU/GlobalISel: Select G_PTR_MASK	Matt Arsenault	2019-09-09	3	-0/+70
\| \| \| \|	llvm-svn: 371412
*	AMDGPU/GlobalISel: Fix reg bank for uniform LDS loads	Matt Arsenault	2019-09-09	1	-8/+14
\| \| \| \| \| \| \|	The pointer is always a VGPR. Also fix hardcoding the pointer size to 64. llvm-svn: 371411
*	AMDGPU/GlobalISel: Use known bits for selection	Matt Arsenault	2019-09-09	1	-8/+3
\| \| \| \|	llvm-svn: 371409
*	AMDGPU/GlobalISel: Legalize wavefrontsize intrinsic	Matt Arsenault	2019-09-09	1	-0/+6
\| \| \| \|	llvm-svn: 371407
*	AMDGPU/GlobalISel: Try generated matcher before add/sub code	Matt Arsenault	2019-09-09	1	-4/+4
\| \| \| \| \| \|	This will allow optimization patterns which fold adds away to work. llvm-svn: 371406
*	AMDGPU/GlobalISel: Remove dead patterns	Matt Arsenault	2019-09-09	1	-5/+0
\| \| \| \|	llvm-svn: 371404
*	AMDGPU: Remove pointless wrapper nodes for init.exec intrinsics	Matt Arsenault	2019-09-09	5	-28/+6
\| \| \| \|	llvm-svn: 371364
*	Change TargetLibraryInfo analysis passes to always require Function	Teresa Johnson	2019-09-07	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284
*	GlobalISel: Support physical register inputs in patterns	Matt Arsenault	2019-09-06	1	-5/+7
\| \| \| \|	llvm-svn: 371253
*	AMDGPU: Fix typo	Matt Arsenault	2019-09-06	1	-4/+4
\| \| \| \|	llvm-svn: 371249
*	[AMDGPU] Enable constant offset promotion to immediate operand for VMEM stores	Valery Pykhtin	2019-09-06	1	-4/+5
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214
*	[AMDGPU] Mark s_barrier as having side effects but not accessing memory.	Jay Foad	2019-09-06	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes poor scheduling in a function containing a barrier and a few load instructions. Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial edge in the dependency graph from the barrier instruction to the exit node representing live-out latency, with a latency of about 500 cycles. Because of this it thinks the critical path through the graph also has a latency of about 500 cycles. And because of that it does not think that any of the load instructions are on the critical path, so it schedules them with no regard for their (80 cycle) latency, which gives poor results. Reviewers: arsenm, dstuttard, tpr, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67218 llvm-svn: 371192
*	AMDGPU/GlobalISel: Avoid repeating 32-bit type lists	Matt Arsenault	2019-09-06	4	-6/+14
\| \| \| \|	llvm-svn: 371156
*	AMDGPU/GlobalISel: Fix load/store of types in other address spaces	Matt Arsenault	2019-09-06	2	-5/+26
\| \| \| \| \| \|	There should probably be a size only matcher. llvm-svn: 371155
*	AMDGPU: Allow getMemOperandWithOffset to analyze stack accesses	Matt Arsenault	2019-09-05	1	-2/+19
\| \| \| \| \| \| \|	Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149
*	AMDGPU: Fix emitting multiple stack loads for stack passed workitems	Matt Arsenault	2019-09-05	1	-1/+15
\| \| \| \| \| \| \| \| \| \|	The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148
*	AMDGPU: Fix Register copypaste error	Matt Arsenault	2019-09-05	1	-2/+2
\| \| \| \|	llvm-svn: 371141
*	AMDGPU: Avoid constructing new std::vector in initCandidate	Matt Arsenault	2019-09-05	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Approximately 30% of the time was spent in the std::vector constructor. In one testcase this pushes the scheduler to being the second slowest pass. I'm not sure I understand why these vector are necessary. The default scheduler initCandidate seems to use some pre-existing vectors for the pressure. llvm-svn: 371136
*	[LLVM][Alignment] Make functions using log of alignment explicit	Guillaume Chatelet	2019-09-05	4	-16/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045
*	AMDGPU: Add intrinsics for address space identification	Matt Arsenault	2019-09-05	5	-1/+50
\| \| \| \| \| \| \|	The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009
*	AMDGPU/GlobalISel: Restore insert point when getting aperture	Matt Arsenault	2019-09-05	1	-0/+6
\| \| \| \| \| \|	Avoids SSA violations in a future patch. llvm-svn: 371008
*	AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast	Matt Arsenault	2019-09-05	1	-4/+6
\| \| \| \|	llvm-svn: 371007
*	AMDGPU/GlobalISel: Fix assert on load from constant address	Matt Arsenault	2019-09-05	1	-4/+4
\| \| \| \|	llvm-svn: 371006
*	AMDGPU/GlobalISel: Select G_BITREVERSE	Matt Arsenault	2019-09-04	2	-1/+2
\| \| \| \|	llvm-svn: 370980
*	GlobalISel: Add basic legalization for G_BITREVERSE	Matt Arsenault	2019-09-04	1	-1/+1
\| \| \| \|	llvm-svn: 370979
*	AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9	Matt Arsenault	2019-09-04	2	-26/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929
*	AMDGPU/GlobalISel: Make 16-bit constants legal	Matt Arsenault	2019-09-04	1	-11/+5
\| \| \| \| \| \|	This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921
*	[GlobalISel][CallLowering] Add support for splitting types according to ↵	Amara Emerson	2019-09-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822
*	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in ↵	Jay Foad	2019-09-02	2	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667
*	[AMDGPU][MC][GFX10] Corrected constant bus checks to exclude null	Dmitry Preobrazhensky	2019-09-02	1	-3/+6
\| \| \| \| \| \| \| \| \| \|	See AMD SWDEV-157286 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65229 llvm-svn: 370665
*	[AMDGPU][MC][GFX10] Enabled null with 64-bit operands	Dmitry Preobrazhensky	2019-09-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745 Reviewers: atamazov, arsenm https://reviews.llvm.org/D65231 llvm-svn: 370660
*	[AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructions	Dmitry Preobrazhensky	2019-09-02	1	-4/+23
\| \| \| \| \| \| \| \| \| \|	See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65228 llvm-svn: 370652
*	AMDGPU: Remove unused custom node definition	Matt Arsenault	2019-09-01	3	-12/+0
\| \| \| \|	llvm-svn: 370603
*	[NFC] Fixed -Wdocumentation warning	David Bolvansky	2019-08-31	1	-8/+8
\| \| \| \| \| \| \|	/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/lib/Target/AMDGPU/AMDGPUGenRegisterBankInfo.def:98:1: warning: not a Doxygen trailing comment [-Wdocumentation] 1 warning generated. llvm-svn: 370596
*	Fix the build for MSVC builds using M_PI	Reid Kleckner	2019-08-29	1	-0/+7
\| \| \| \|	llvm-svn: 370405
*	AMDGPU/GlobalISel: Legalize sin/cos	Matt Arsenault	2019-08-29	2	-0/+43
\| \| \| \|	llvm-svn: 370402
*	AMDGPU: Don't use frame virtual registers	Matt Arsenault	2019-08-29	3	-41/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281
*	GlobalISel/TableGen: Handle setcc patterns	Matt Arsenault	2019-08-29	2	-5/+4
\| \| \| \| \| \| \| \| \| \| \|	This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280
*	[AMDGPU] Fix bug when calculating user_spgr_count for Code Object V3 assembler	Scott Linder	2019-08-28	1	-7/+14
\| \| \| \| \| \| \| \|	Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor. Differential Revision: https://reviews.llvm.org/D66900 llvm-svn: 370250
*	[AMDGPU] Adjust number of SGPRs available in Calling Convention	Ryan Taylor	2019-08-28	1	-18/+2
\| \| \| \| \| \| \| \| \|	This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215
*	AMDGPU/GlobalISel: Fix constraining scalar and/or/xor	Matt Arsenault	2019-08-28	1	-8/+1
\| \| \| \| \| \| \|	If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150
*	AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace	Matt Arsenault	2019-08-28	1	-8/+31
\| \| \| \|	llvm-svn: 370140
*	AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serialization	Matt Arsenault	2019-08-27	2	-1/+6
\| \| \| \|	llvm-svn: 370089
*	AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16	Matt Arsenault	2019-08-27	1	-3/+3
\| \| \| \| \| \| \|	This is something of a workaround since computeRegisterProperties seems to be doing the wrong thing. llvm-svn: 370086
*	AMDGPU: Combine directly on mul24 intrinsics	Matt Arsenault	2019-08-27	2	-3/+28
\| \| \| \| \| \| \| \| \|	The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994
*	AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts	Matt Arsenault	2019-08-27	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mul24 matching could interfere with SLSR and the other addressing mode related passes. This probably is not the optimal placement, but is an intermediate step. This should probably be moved after all the generic IR passes, particularly LSR. Moving this after LSR seems to help in some cases, and hurts others. As-is in this patch, in idiv-licm, it saves 1-2 instructions inside some of the loop bodies, but increases the number in others. Moving this later helps these loops. In the new lsr tests in mul24-pass-ordering, the intrinsic prevents introducing more instructions in the loop preheader, so moving this later ends up hurting them. This shouldn't be any worse than before the intrinsics were introduced in r366094, and LSR should probably be smarter. I think it's because it doesn't know the and inside the loop will be folded away. llvm-svn: 369991
*	[AMDGPU] Downgrade from StringLiteral to const char* in an attempt to make ↵	Benjamin Kramer	2019-08-25	1	-3/+3
\| \| \| \| \| \|	GCC 5 happy llvm-svn: 369867
*	AMDGPU: Preserve value name when inserting mul24 intrinsic	Matt Arsenault	2019-08-24	1	-1/+3
\| \| \| \|	llvm-svn: 369857