bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Fixed memory leak with inliner replaced	Stanislav Mekhanoshin	2017-09-20	1	-1/+3
\| \| \| \| \| \|	Delete inliner before replacing it. llvm-svn: 313723
*	AMDGPU: Move r600 only code into r600 only td file	Matt Arsenault	2017-09-20	2	-53/+54
\| \| \| \|	llvm-svn: 313719
*	[AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.ll	Stanislav Mekhanoshin	2017-09-20	1	-1/+2
\| \| \| \|	llvm-svn: 313718
*	AMDGPU: Match load d16 hi instructions	Matt Arsenault	2017-09-20	5	-50/+161
\| \| \| \| \| \| \| \| \| \| \| \|	Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
*	[AMDGPU] Port of HSAIL inliner	Stanislav Mekhanoshin	2017-09-20	5	-1/+218
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
*	AMDGPU: Cleanup load/store PatFrags	Matt Arsenault	2017-09-20	7	-271/+244
\| \| \| \| \| \|	Try to use a consistent naming scheme. llvm-svn: 313713
*	AMDGPU: Match store d16_hi instructions	Matt Arsenault	2017-09-20	4	-18/+77
\| \| \| \|	llvm-svn: 313712
*	[AMDGPU] Prevent post-RA scheduler from breaking memory clauses	Stanislav Mekhanoshin	2017-09-19	2	-0/+58
\| \| \| \| \| \| \| \| \|	The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670
*	AMDGPU: Run internalize symbols at -O0	Matt Arsenault	2017-09-19	1	-21/+21
\| \| \| \| \| \| \| \|	The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616
*	AMDGPU: Start selecting s_xnor_{b32, b64}	Konstantin Zhuravlyov	2017-09-18	3	-2/+48
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
*	Fix warnings in r313297.	Jan Sjodin	2017-09-14	2	-5/+3
\| \| \| \|	llvm-svn: 313302
*	AMDGPU: Fix violating constant bus restriction	Matt Arsenault	2017-09-14	1	-4/+5
\| \| \| \| \| \|	You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298
*	Add AddresSpace to PseudoSourceValue.	Jan Sjodin	2017-09-14	4	-4/+29
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
*	AMDGPU: Fix assert on alloca of array of struct	Matt Arsenault	2017-09-14	1	-6/+5
\| \| \| \|	llvm-svn: 313282
*	AMDGPU: Stop modifying SP in call sequences	Matt Arsenault	2017-09-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279
*	AMDGPU: Make frame register caller preserved	Matt Arsenault	2017-09-14	2	-10/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274
*	AMDGPU: Don't spill SP reg like a normal CSR	Matt Arsenault	2017-09-13	3	-0/+16
\| \| \| \|	llvm-svn: 313217
*	Allow target to decide when to cluster loads/stores in misched	Stanislav Mekhanoshin	2017-09-13	2	-1/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
*	AMDGPU: Handle coldcc in more places	Matt Arsenault	2017-09-13	1	-0/+2
\| \| \| \| \| \|	Missed in r312936 llvm-svn: 313205
*	AMDGPU: Allow coldcc calls	Matt Arsenault	2017-09-11	1	-0/+2
\| \| \| \|	llvm-svn: 312936
*	[AMDGPU] Produce madak and madmk from the two-address pass	Stanislav Mekhanoshin	2017-09-11	1	-0/+42
\| \| \| \| \| \| \| \| \| \|	These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928
*	[AMDGPU] exp should not be in WQM mode	Tim Renouf	2017-09-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915
*	AMDGPU: trivial comment change	Tim Renouf	2017-09-11	1	-1/+1
\| \| \| \| \| \|	... to check commit access for new committer. llvm-svn: 312900
*	[AMDGPU] Remove unused function. NFCI.	Davide Italiano	2017-09-08	1	-9/+0
\| \| \| \|	llvm-svn: 312836
*	AMDGPU: Start using !con operator	Matt Arsenault	2017-09-08	1	-14/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	We have a lot of operand definition work essentially producing every valid permutation of operands to workaround builiding operand lists based on the instruction features. Apparently tablegen already has a mostly undocumented operator to concat dags which simplies this. Convert one simple place to use this. The BUF instruction definitions have much more complicated logic that can be totally rewritten now. llvm-svn: 312822
*	AMDGPU: Recompute scc liveness	Matt Arsenault	2017-09-08	1	-1/+7
\| \| \| \| \| \| \| \|	The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819
*	AMDGPU: Start selecting v_mad_mix_f32	Matt Arsenault	2017-09-07	4	-5/+105
\| \| \| \|	llvm-svn: 312732
*	AMDGPU: Handle non-temporal loads and stores	Konstantin Zhuravlyov	2017-09-07	1	-23/+59
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729
*	AMDGPU: Handle more than one memory operand in SIMemoryLegalizer	Konstantin Zhuravlyov	2017-09-07	2	-58/+145
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37397 llvm-svn: 312725
*	AMDGPU: Don't legalize i16 extloads to i32 with legal i16	Matt Arsenault	2017-09-07	3	-1/+8
\| \| \| \| \| \| \|	Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699
*	[AMDGPU] Use v_pk_max_f16 for fcanonicalize	Stanislav Mekhanoshin	2017-09-06	1	-5/+10
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676
*	[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize	Stanislav Mekhanoshin	2017-09-06	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660
*	[AMDGPU] Fix shouldClusterMemOps to process flat loads	Stanislav Mekhanoshin	2017-09-06	1	-0/+4
\| \| \| \| \| \| \| \|	Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640
*	AMDGPU: Make worst-case assumption about the wait states in inline assembly	Nicolai Haehnle	2017-09-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635
*	[AMDGPU] Transform __read_pipe_* and __write_pipe_*	Yaxun Liu	2017-09-06	3	-74/+377
\| \| \| \| \| \| \| \| \|	When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598
*	AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]:	Konstantin Zhuravlyov	2017-09-05	1	-143/+157
\| \| \| \| \| \| \| \| \|	- Refactor SIMemOpInfo's constructors - Allow construction of NotAtomic SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37396 llvm-svn: 312563
*	AMDGPU: Fix not accounting for tail call resource usage	Matt Arsenault	2017-09-05	1	-1/+2
\| \| \| \| \| \| \| \|	If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561
*	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [2]:	Konstantin Zhuravlyov	2017-09-05	1	-151/+174
\| \| \| \| \| \| \| \| \| \|	- Make SIMemOpInfo a class - Add accessor methods to SIMemOpInfo - Move get*Info methods to SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37395 llvm-svn: 312541
*	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [1]:	Konstantin Zhuravlyov	2017-09-05	1	-46/+50
\| \| \| \| \| \| \| \| \|	- Rename MemOpInfo -> SIMemOpInfo - Move SIMemOpInfo class out of SIMemoryLegalizer class Differential Revision: https://reviews.llvm.org/D37394 llvm-svn: 312540
*	[AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()	Stanislav Mekhanoshin	2017-09-01	1	-2/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37392 llvm-svn: 312364
*	AMDGPU: Add ds_{read\|write}_addtid_b32 definitions	Matt Arsenault	2017-09-01	2	-0/+13
\| \| \| \|	llvm-svn: 312349
*	AMDGPU: Add most d16 load/store instruction definitions	Matt Arsenault	2017-09-01	5	-15/+147
\| \| \| \| \| \| \|	Doesn't include the tied operand necessary for the loads, but is enough for the assembler to work. llvm-svn: 312347
*	AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states	Nicolai Haehnle	2017-09-01	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337
*	AMDGPU: Fold clamp modifier for packed instructions	Matt Arsenault	2017-08-31	6	-20/+73
\| \| \| \|	llvm-svn: 312297
*	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use ↵	Eugene Zelenko	2017-08-31	1	-4/+19
\| \| \| \| \| \|	warnings; other minor fixes. Also affected in files (NFC). llvm-svn: 312289
*	AMDGPU: Turn int pack pattern into build_vector	Matt Arsenault	2017-08-31	2	-1/+18
\| \| \| \| \| \| \| \| \| \|	build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. llvm-svn: 312282
*	AMDGPU: Don't assert in TTI with fp32 denorms enabled	Matt Arsenault	2017-08-31	1	-3/+25
\| \| \| \| \| \|	Also refine for f16 and rcp cases. llvm-svn: 312213
*	AMDGPU: Use set for tracked registers	Matt Arsenault	2017-08-31	1	-20/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The majority of the time spent in the pass checking for the register reads. Rather than searching all of the defined registers for uses in each instruction, use a set of defined registers and check the operands of the instruction. This process still is algorithmically not great, but with the additional trick of skipping the analysis for addresses with one use, this brings one slow testcase into a reasonable range. llvm-svn: 312206
*	AMDGPU: Correct operand types for v_mad_mix*	Matt Arsenault	2017-08-30	4	-13/+37
\| \| \| \| \| \| \| \| \| \| \| \|	These aren't really packed instructions, so the default op_sel_hi should be 0 since this indicates a conversion. The operand types are scalar values that behave similar to an f16 scalar that may be converted to f32. Doesn't change the default printing for op_sel_hi, just the parsing. llvm-svn: 312179
*	AMDGPU: Don't look for DS merge candidates with one use address	Matt Arsenault	2017-08-30	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	The merge is only possible if the base address register is the same for the two instructions. If there is only the one use, there's no point in doing an expensive forward scan checking for memory interference looking for a merge candidate. This gives a signficant improvement in one extreme testcase. The code to do the scan is still algorithmically terrible, so this is still the slowest pass in that example. llvm-svn: 312096