bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Fix multiple vreg definitions in si-lower-control-flow	Stanislav Mekhanoshin	2016-11-22	2	-8/+8
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26939 llvm-svn: 287608
*	DAG: Ignore call site attributes when emitting target intrinsic	Matt Arsenault	2016-11-21	1	-7/+18
\| \| \| \| \| \| \| \| \| \|	A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. llvm-svn: 287593
*	[AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input	Konstantin Zhuravlyov	2016-11-18	2	-13/+44
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26862 llvm-svn: 287389
*	AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts	Tom Stellard	2016-11-18	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The 32-bit instructions don't zero the high 16-bits like the 16-bit instructions do. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26828 llvm-svn: 287342
*	AMDGPU: Fix legalization of MUBUF instructions in shaders	Nicolai Haehnle	2016-11-18	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339
*	AMDGPU: Fix crash on illegal type for inlineasm	Matt Arsenault	2016-11-18	1	-0/+83
\| \| \| \| \| \| \|	There are still crashes on non-MVT types in other places. llvm-svn: 287310
*	Revert "AMDGPU: Enable ConstrainCopy DAG mutation"	Konstantin Zhuravlyov	2016-11-17	7	-30/+28
\| \| \| \| \| \| \| \|	This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233
*	[AMDGPU] Add missing test for rL287203	Konstantin Zhuravlyov	2016-11-17	1	-3/+3
\| \| \| \|	llvm-svn: 287204
*	[AMDGPU] Promote f16/i16 conversions to f32/i32	Konstantin Zhuravlyov	2016-11-17	6	-84/+67
\| \| \| \|	llvm-svn: 287201
*	[AMDGPU] Expand `br_cc` for f16	Konstantin Zhuravlyov	2016-11-17	1	-0/+112
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26732 llvm-svn: 287199
*	AMDGPU: Enable ConstrainCopy DAG mutation	Matt Arsenault	2016-11-16	7	-28/+30
\| \| \| \| \| \| \|	This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146
*	AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass	Tom Stellard	2016-11-16	4	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
*	[AMDGPU] Handle f16 select{_cc}	Konstantin Zhuravlyov	2016-11-16	1	-0/+322
\| \| \| \| \| \| \| \| \| \|	- Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 llvm-svn: 287074
*	AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument	Jan Vesely	2016-11-15	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 llvm-svn: 287056
*	AMDGPU/SI: Fix pattern for i16 = sign_extend i1	Tom Stellard	2016-11-15	1	-0/+33
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035
*	AMDGPU: Enable store clustering	Matt Arsenault	2016-11-15	1	-5/+13
\| \| \| \| \| \| \|	Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021
*	AMDGPU: Analyze mubuf with immediate soffset	Matt Arsenault	2016-11-15	1	-0/+34
\| \| \| \| \| \| \|	Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018
*	[AMDGPU] Add wave barrier builtin	Stanislav Mekhanoshin	2016-11-15	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007
*	AMDGPU: Fix f16 fabs/fneg	Matt Arsenault	2016-11-15	4	-11/+294
\| \| \| \|	llvm-svn: 286931
*	AMDGPU: Fix formatting of 1/2pi immediate	Matt Arsenault	2016-11-15	3	-8/+8
\| \| \| \|	llvm-svn: 286912
*	AMDGPU/SI: Support data types other than V4f32 in image intrinsics	Changpeng Fang	2016-11-14	3	-1/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860
*	AMDGPU: Implement SGPR spilling with scalar stores	Matt Arsenault	2016-11-13	4	-28/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766
*	[AMDGPU] Add f16 support (VI+)	Konstantin Zhuravlyov	2016-11-13	41	-36/+4122
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
*	AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI	Tom Stellard	2016-11-12	2	-15/+37
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687
*	AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies	Tom Stellard	2016-11-11	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676
*	ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()	Matthias Braun	2016-11-11	5	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544
*	Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate ↵	Stanislav Mekhanoshin	2016-11-11	2	-49/+3
\| \| \| \| \| \| \| \|	condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530
*	AMDGPU: Emit runtime metadata as a note element in .note section	Yaxun Liu	2016-11-10	4	-72/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 llvm-svn: 286502
*	AMDGPU: Add VI i16 support	Tom Stellard	2016-11-10	27	-263/+1081
\| \| \| \| \| \| \| \|	Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464
*	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition ↵	Stanislav Mekhanoshin	2016-11-07	2	-3/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. llvm-svn: 286171
*	AMDGPU: Remove unnecessary and on conditional branch	Matt Arsenault	2016-11-07	9	-34/+16
\| \| \| \| \| \| \|	The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134
*	Revert "AMDGPU: Add VI i16 support"	Tom Stellard	2016-11-04	27	-1081/+263
\| \| \| \| \| \|	This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
*	AMDGPU: Add VI i16 support	Tom Stellard	2016-11-03	27	-263/+1081
\| \| \| \| \| \| \| \|	Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
*	[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.	Alexander Timofeev	2016-11-03	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919
*	AMDGPU: Cleanup some xfailed tests	Matt Arsenault	2016-11-02	3	-52/+10
\| \| \| \| \| \|	Some of these are already fixed or tested somewhere else. llvm-svn: 285840
*	BranchRelaxation: Fix computing indirect branch block size	Matt Arsenault	2016-11-02	1	-0/+54
\| \| \| \|	llvm-svn: 285828
*	AMDGPU: Use brev for materializing SGPR constants	Matt Arsenault	2016-11-01	6	-11/+73
\| \| \| \| \| \|	This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765
*	AMDGPU: Default to using scalar mov to materialize immediate	Matt Arsenault	2016-11-01	3	-12/+47
\| \| \| \| \| \| \| \| \| \| \| \|	This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762
*	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32	Konstantin Zhuravlyov	2016-11-01	2	-87/+81
\| \| \| \| \| \| \| \| \| \| \|	This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716
*	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64	Tom Stellard	2016-11-01	2	-19/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704
*	[AMDGPU] Expand vector mulhu/mulhs	Valery Pykhtin	2016-11-01	2	-0/+26
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684
*	AMDGPU: Use 1/2pi inline imm on VI	Matt Arsenault	2016-10-29	1	-0/+60
\| \| \| \| \| \|	I'm guessing at how it is supposed to be printed llvm-svn: 285490
*	AMDGPU: Add definitions for scalar store instructions	Matt Arsenault	2016-10-28	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463
*	AMDGPU: Change check prefix in test	Matt Arsenault	2016-10-28	1	-219/+219
\| \| \| \|	llvm-svn: 285449
*	AMDGPU: Diagnose using too many SGPRs	Matt Arsenault	2016-10-28	1	-0/+102
\| \| \| \| \| \|	This is possible when using inline asm. llvm-svn: 285447
*	AMDGPU: Fix using incorrect private resource with no allocation	Matt Arsenault	2016-10-28	6	-9/+139
\| \| \| \| \| \| \| \| \| \| \|	It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435
*	AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register ↵	Nicolai Haehnle	2016-10-27	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dependencies Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind that the instruction we merge might actually use one of the defs that are being moved. Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect]. The fact that the ds_read in the test case is not eliminated suggests that there might be another problem related to alias analysis, but that's a separate problem: this pass should still work correctly even when earlier optimization passes missed something or were disabled. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25829 llvm-svn: 285273
*	AMDGPU: Refactor processor definition to use ISA version features	Yaxun Liu	2016-10-26	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend. Refactor processor definition to use ISA version features. Fixed ISA version for stoney. Based on Laurent Morichetti's patch. Differential Revision: https://reviews.llvm.org/D25919 llvm-svn: 285210
*	Reapply "AMDGPU: Don't use offen if it is 0"	Matt Arsenault	2016-10-26	10	-74/+91
\| \| \| \| \| \|	This reverts r283003 llvm-svn: 285203
*	AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch	Tom Stellard	2016-10-26	1	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A single flat memory operations that might access the scratch buffer can only access MaxPrivateElementSize bytes. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25788 llvm-svn: 285198