bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU/GlobalISel: Partially fix llvm.amdgcn.kill pattern import	Matt Arsenault	2020-01-07	1	-4/+4
\| \| \| \| \|	Tests deferred since the existing DAG test depends on some other operations, but isn't far from working as-is.
*	Fix "use of uninitialized variable" static analyzer warning. NFCI.	Simon Pilgrim	2020-01-07	1	-1/+1
\|
*	[MC] Add parameter `Address` to MCInstrPrinter::printInstruction	Fangrui Song	2020-01-06	2	-5/+5
\| \| \| \| \| \| \| \|	Follow-up of D72172. Reviewed By: jhenderson, rnk Differential Revision: https://reviews.llvm.org/D72180
*	[MC] Add parameter `Address` to MCInstPrinter::printInst	Fangrui Song	2020-01-06	3	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
*	AMDGPU/GlobalISel: Fix unused variable warning in release	Matt Arsenault	2020-01-06	1	-2/+1
\|
*	AMDGPU: Select llvm.amdgcn.interp.p2.f16 directly	Matt Arsenault	2020-01-06	2	-26/+12
\| \| \| \|	This will enable automatic GlobalISel support in a future commit.
*	AMDGPU: Use default operands for clamp/omod	Matt Arsenault	2020-01-06	1	-10/+28
\| \| \| \| \| \| \|	We have a lot of complex pattern variants that just set the source modifiers that are really handled, and then set the output modifiers to 0. We're unlikely to ever match output modifiers from the use instruction side, and we already match clamp/omod in a separate pass.
*	AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTER	Matt Arsenault	2020-01-06	2	-1/+5
\|
*	AMDGPU/GlobalISel: Select G_UADDE/G_USUBE	Matt Arsenault	2020-01-06	2	-11/+31
\|
*	AMDGPU/GlobalISel: Replace handling of boolean values	Matt Arsenault	2020-01-06	8	-249/+427
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.
*	GlobalISel: Implement lower for G_INTRINSIC_ROUND	Matt Arsenault	2020-01-06	2	-3/+2
\| \| \| \| \|	Mostly copied from AMDGPU lowering implementation, except used G_SITOFP instead of directly creating a select on -1.0, 0.0.
*	GlobalISel: Fix unsupported legalize action	Matt Arsenault	2020-01-06	1	-0/+5
\| \| \| \| \| \| \| \|	This would complain about invalid legalizer rules otherwise. Mark some operations as unsupported for AMDGPU. This currently seems to produce the same legalize error as when no rules are defined, but eventually this should produce a proper user facing error.
*	AMDGPU: Fix legalizing f16 fpow	Matt Arsenault	2020-01-06	2	-0/+2
\| \| \| \| \| \|	The existing test only covered one case for r600. The use of mul_legacy also looks suspicious to me, but leave it for now. The patterns are also not making use of source modifiers.
*	AMDGPU: Use ImmLeaf	Matt Arsenault	2020-01-06	1	-2/+2
\| \| \| \| \|	This solves one GlobalISel importer error, but the pattern still fails for another reason.
*	AMDGPU: Use ImmLeaf for inline immediate predicates	Matt Arsenault	2020-01-06	6	-9/+63
\|
*	[AMDGPU] Fix "use of uninitialized variable" static analyzer warning. NFCI.	Simon Pilgrim	2020-01-06	1	-0/+1
\| \| \| \|	Add "unreachable" default case to AMDGPUTargetStreamer::getArchNameFromElfMach
*	AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR	Matt Arsenault	2020-01-06	3	-25/+40
\|
*	AMDGPU/GlobalISel: Select more G_EXTRACTs correctly	Matt Arsenault	2020-01-06	1	-5/+19
\| \| \| \| \| \| \| \|	This assumed a 32-bit extract size, which would produce invalid copies with 64-bit extracts. Handle the easy case. Ideally we would have a way to get the proper subreg index for any 32-bit offset, but there should probably be a tablegenerated way of getting the subreg index for any size and offset.
*	[NFC] Fix trivial typos in comments	James Henderson	2020-01-06	2	-2/+2
\| \| \| \| \| \| \| \|	Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
*	[APFloat] Add recoverable string parsing errors to APFloat	Ehud Katz	2020-01-06	1	-2/+2
\| \| \| \| \| \|	Implementing the APFloat part in PR4745. Differential Revision: https://reviews.llvm.org/D69770
*	GlobalISel: Scalarize all division operations	Matt Arsenault	2020-01-04	1	-0/+7
\| \| \| \| \| \|	This only handled G_SDIV, but they all are trivially scalarizable. Also define placeholder AMDGPU division legalizer rules.
*	AMDGPU/GlobalISel: Refine SMRD selection rules	Matt Arsenault	2020-01-04	1	-4/+22
\| \| \| \| \|	Fix selecting these for volatile global loads, and ensure the loads are constant enough.
*	AMDGPU/GlobalISel: Legalize more odd sized loads	Matt Arsenault	2020-01-04	1	-5/+9
\| \| \| \| \|	The attempts to widen sufficently aligned, odd sized loads wasn't consistently applied.
*	AMDGPU/GlobalISel: Assume vcc phis for any vcc input	Matt Arsenault	2020-01-04	1	-2/+3
\| \| \| \| \| \| \|	This produces more intelligible looking results, more comparabble to the DAG output in the simplest cases. This is probably wrong in complex control flow, but RegBankSelect doesn't attempt analyzing if this is on a masked path for selecting the bank yet.
*	AMDGPU/GlobalISel: Implement applyMappingImpl less incorrectly	Matt Arsenault	2020-01-04	1	-13/+23
\| \| \| \| \| \| \| \| \| \| \|	We're checking the current register bank of the registers in the instruction, but the mapping may have inserted cross bank copies and is expecting to replace the registers. We mostly get away with this currently, because VGPR->SGPR copies are illegal, and we assume this won't happen. In a future change, we'll start relying on more cross register bank copies being inserted, and this starts to break down.
*	[AMDGPU] need to insert wait between the scalar load and vector store to the ↵	alex-t	2020-01-04	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	same address to avoid WAR conflict. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D71934
*	[AMDGPU] Revert scheduling to reduce spilling	Stanislav Mekhanoshin	2020-01-03	1	-2/+11
\| \| \| \| \| \| \| \| \| \|	We can revert region schedule if new schedule decreases occupancy. However, if we already have only one wave we would accept any new schedule even if it blows up register pressure. Such schedule may result in quite heavy spilling which can be avoided if we reject this new schedule. Differential Revision: https://reviews.llvm.org/D72181
*	GlobalISel: Add type argument to getRegBankFromRegClass	Matt Arsenault	2020-01-03	2	-4/+5
\| \| \| \| \| \|	AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
*	[amdgpu] Skip non-instruction values in CF user tracing.	Michael Liao	2020-01-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - CF users won't be non-instruction values. Skip them to save the compilation time. It's especially true when there are multiple functions in that module, where, says, a constant may be used in most functions. The current CF user tracing adds significant overhead. Reviewers: alex-t, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72174
*	AMDGPU/GlobalISel: Add new utils file	Matt Arsenault	2020-01-03	4	-33/+77
\| \| \| \| \| \|	There are some things that are shareable between the legalizer, regbankselect, and the selector that don't have an obvious place to go.
*	AMDGPU: Only allow regs for s_movrel_{b32\|b64}	Matt Arsenault	2020-01-03	1	-2/+13
\| \| \| \| \|	This would incorrectly allowing folding immediates. These currently aren't selectable, but will be from GlobalISel soon.
*	Move tail call disabling code to target independent code	Reid Kleckner	2020-01-03	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
*	AMDGPU/GlobalISel: Fix off by one in operand index	Matt Arsenault	2020-01-03	1	-4/+4
\| \| \| \|	This should be looking at the RHS of the add for a constant.
*	AMDGPU/GlobalISel: Remove manual G_FENCE selection	Matt Arsenault	2020-01-02	1	-5/+0
\| \| \| \| \|	The tablegen emitter now handles the immediate operand correctly, so let the generatedd matcher works.
*	DAG: Use TargetConstant for FENCE operands	Matt Arsenault	2020-01-02	1	-1/+1
\|
*	Remove unneeded extra variable realArgIdx. NFC.	Jay Foad	2020-01-02	1	-4/+3
\|
*	[amdgpu] Fix scoreboard updating on `s_waitcnt_vscnt`.	Michael Liao	2019-12-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Summary: - Other counters are accidentally cleared. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71866
*	[TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues ↵	Craig Topper	2019-12-30	3	-9/+19
\| \| \| \| \| \| \| \| \| \| \|	instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.
*	AMDGPU/GlobalISel: Select mul24 intrinsics	Matt Arsenault	2019-12-30	2	-4/+12
\|
*	AMDGPU/GlobalISel: Re-use MRI available in selector	Matt Arsenault	2019-12-30	1	-9/+7
\|
*	AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftz	Matt Arsenault	2019-12-30	2	-4/+9
\|
*	GlobalISel: moreElementsVector for FP min/max	Matt Arsenault	2019-12-30	1	-0/+1
\|
*	AMDGPU: Improve llvm.round.f64 lowering for CI+	Matt Arsenault	2019-12-30	2	-4/+5
\| \| \| \| \|	The path already used for f16/f32 works a lot better when v_trunc_f64 is available.
*	AMDGPU/GlobalISel: Account for G_PHI result bank	Matt Arsenault	2019-12-30	1	-13/+23
\| \| \| \| \| \| \| \| \|	Sometimes the result bank of the phi is already assigned to something, and should not be ignored. This is in preparation for additional boolean phi handling changes. Also refine the logic to fix some cases that were incorrectly deciding to use SGPRs.
*	AMDGPU/GlobalISel: Use SReg_32 for readfirstlane constraining	Matt Arsenault	2019-12-27	1	-1/+1
\| \| \| \| \|	This matches the DAG behavior where we don't use SReg_32_XM0 everywhere anymore, and fixes not coalescing the copies into m0.
*	TII: Fix using Register for a subregister index argument	Matt Arsenault	2019-12-27	2	-2/+2
\|
*	AMDGPU: Use Register	Matt Arsenault	2019-12-27	1	-9/+9
\|
*	AMDGPU/GlobalISel: Fix extra result register in fdiv64 lowering	Matt Arsenault	2019-12-27	1	-2/+1
\| \| \| \| \| \|	There ended up being two result registers, which would fail on select. It was really defing a new temp register in the correct def position, instead of the correct result register.
*	AMDGPU/GlobalISel: Select some 128-bit load/stores	Matt Arsenault	2019-12-27	1	-4/+10
\|
*	AMDGPU: Use correct DebugLoc	Matt Arsenault	2019-12-27	1	-1/+1
\|