bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Added v5i32 and v5f32 register classes	Tim Renouf	2019-03-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
*	[AMDGPU] Support for v3i32/v3f32	Tim Renouf	2019-03-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
*	[AMDGPU] Fix clamp bit DAG operand	Michael Liao	2019-03-20	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - Should use `targetconstant` instead of `constant` operand for clamp bit, which is expected as an immediate operand. Under certain conditions, such as a common `i1 false` constant is used in other place and selected before the instruction with clamp bit, register operand may be added instead of immediate one. Use `targetcosntant` to enforce that. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59608 llvm-svn: 356608
*	[AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic	Tim Renouf	2019-03-18	1	-10/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
*	[AMDGPU] Silence gcc 7 warnings	Stanislav Mekhanoshin	2019-03-13	1	-30/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D59330 llvm-svn: 356100
*	AMDGPU: Move d16 load matching to preprocess step	Matt Arsenault	2019-03-08	1	-35/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
*	AMDGPU: Add DS append/consume intrinsics	Matt Arsenault	2019-01-28	1	-15/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. llvm-svn: 352422
*	Codegen support for atomicrmw fadd/fsub	Matt Arsenault	2019-01-22	1	-1/+1
\| \| \| \|	llvm-svn: 351851
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	AMDGPU: Don't peel of the offset if the resulting base could possibly be ↵	Changpeng Fang	2018-12-21	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision: https://reviews.llvm.org/D55568 llvm-svn: 349951
*	AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI	Nicolai Haehnle	2018-10-17	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
*	[AMDGPU] Rename pass "isel" to "amdgpu-isel"	Fangrui Song	2018-10-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The AMDGPU target specific pass "isel" is a misleading name. Reviewers: tstellar, echristo, javed.absar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52759 llvm-svn: 343659
*	[AMDGPU] Removed unused method	Tim Renouf	2018-09-13	1	-22/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I accidentally left this behind in D50306, and it causes a build warning when I build with gcc7. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52022 Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c llvm-svn: 342189
*	[AMDGPU] Load divergence predicate refactoring	Alexander Timofeev	2018-09-13	1	-0/+25
\| \| \| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D51931 Reviewers: rampitec llvm-svn: 342120
*	[AMDGPU] Preliminary patch for divergence driven instruction selection. ↵	Alexander Timofeev	2018-09-11	1	-1/+51
\| \| \| \| \| \| \| \| \|	Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
*	AMDGPU: Remove remnants of old address space mapping	Matt Arsenault	2018-08-31	1	-11/+6
\| \| \| \|	llvm-svn: 341165
*	[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis	Nicolai Haehnle	2018-08-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071
*	AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes	Marek Olsak	2018-08-29	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51203 llvm-svn: 340959
*	[AMDGPU] New buffer intrinsics	Tim Renouf	2018-08-21	1	-82/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269
*	[SDAG] Remove the reliance on MI's allocation strategy for	Chandler Carruth	2018-08-14	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
*	[AMDGPU] Avoid using divergent value in mubuf addr64 descriptor	Tim Renouf	2018-08-02	1	-50/+73
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a problem where a load from global+idx generated incorrect code on <=gfx7 when the index is divergent. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47383 Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed llvm-svn: 338779
*	AMDGPU: Improve hack for packing conversion ops	Matt Arsenault	2018-08-01	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \|	Mutate the node type during selection when it doesn't matter. This avoids an intermediate bitcast node on targets with legal i16/f16. Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16, which I assume are OK. llvm-svn: 338619
*	AMDGPU: Refactor Subtarget classes	Tom Stellard	2018-07-11	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
*	AMDGPU: Separate R600 and GCN TableGen files	Tom Stellard	2018-06-28	1	-45/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
*	AMDGPU: Add patterns for i32/i64 local atomic load/store	Matt Arsenault	2018-06-22	1	-1/+3
\| \| \| \| \| \| \| \|	Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
*	AMDGPU: Fix scalar_to_vector for v4i16/v4f16	Matt Arsenault	2018-06-20	1	-3/+2
\| \| \| \|	llvm-svn: 335161
*	[AMDGPU] Add perf hints to functions	Stanislav Mekhanoshin	2018-05-25	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289
*	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers	Tom Stellard	2018-05-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
*	[AMDGPU] Add divergence analysis as a dependency for ISel	Stanislav Mekhanoshin	2018-05-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage but does not list it in pass dependencies which may lead to crash. Differential Revision: https://reviews.llvm.org/D47151 llvm-svn: 332862
*	Fix MSVC unused variable warning. NFCI.	Simon Pilgrim	2018-05-19	1	-5/+4
\| \| \| \| \| \|	AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance. llvm-svn: 332807
*	Remove \brief commands from doxygen comments.	Adrian Prantl	2018-05-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
*	AMDGPU: Add Vega12 and Vega20	Matt Arsenault	2018-04-30	1	-8/+15
\| \| \| \| \| \| \| \|	Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215
*	AMDGPU: Remove some dead code	Tom Stellard	2018-04-30	1	-4/+0
\| \| \| \|	llvm-svn: 331196
*	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to ↵	Craig Topper	2018-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806
*	Fix layering by moving ValueTypes.h from CodeGen to IR	David Blaikie	2018-03-23	1	-1/+1
\| \| \| \| \| \|	ValueTypes.h is implemented in IR already. llvm-svn: 328397
*	Fix layering of MachineValueType.h by moving it from CodeGen to Support	David Blaikie	2018-03-23	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395
*	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"	Nirav Dave	2018-03-19	1	-3/+2
\| \| \| \| \| \| \|	Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898
*	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""	Nirav Dave	2018-03-17	1	-2/+3
\| \| \| \| \| \|	as it times out building test-suite on PPC. llvm-svn: 327778
*	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"	Nirav Dave	2018-03-17	1	-3/+2
\| \| \| \| \| \| \|	Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777
*	Revert: r327172 "Correct load-op-store cycle detection analysis"	Nirav Dave	2018-03-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197
*	[DAG] Enforce stricter NodeId invariant during Instruction selection	Nirav Dave	2018-03-09	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170
*	Pass Divergence Analysis data to Selection DAG to drive divergence	Alexander Timofeev	2018-03-05	1	-0/+2
\| \| \| \| \| \| \| \|	dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703
*	Reapply "AMDGPU: Add 32-bit constant address space"	Matt Arsenault	2018-02-09	1	-3/+31
\| \| \| \| \| \|	This reverts r324494 and reapplies r324487. llvm-svn: 324747
*	Revert "AMDGPU: Add 32-bit constant address space"	Rafael Espindola	2018-02-07	1	-31/+3
\| \| \| \| \| \| \| \|	This reverts commit r324487. It broke clang tests. llvm-svn: 324494
*	AMDGPU: Add 32-bit constant address space	Marek Olsak	2018-02-07	1	-3/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
*	[AMDGPU] add LDS f32 intrinsics	Daniil Fukalov	2018-01-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656
*	[AMDGPU] Fixed incorrect uniform branch condition	Tim Renouf	2018-01-09	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I had a case where multiple nested uniform ifs resulted in code that did v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first ensuring that bits for inactive lanes were clear. There was already code for inserting an "s_and_b64 vcc, exec, vcc" to clear bits for inactive lanes in the case that the branch is instruction selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in SIFixSGPRCopies. I have added the same code into SILowerControlFlow for the case that the branch is instruction selected as s_cbranch_vccnz. This de-optimizes the code in some cases where the s_and is not needed, because vcc is the result of a v_cmp, or multiple v_cmp instructions combined by s_and/s_or. We should add a pass to re-optimize those cases. Reviewers: arsenm, kzhuravl Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle Differential Revision: https://reviews.llvm.org/D41292 llvm-svn: 322119
*	AMDGPU: Fix creating invalid copy when adjusting dmask	Matt Arsenault	2017-12-04	1	-4/+8
\| \| \| \| \| \| \| \| \|	Move the entire optimization to one place. Before it was possible to adjust dmask without changing the register class of the output instruction, since they were done in separate places. Fix all lane sizes and move all of the optimization into the DAG folding. llvm-svn: 319705
*	AMDGPU: Use return value of MorphNodeTo	Matt Arsenault	2017-12-04	1	-3/+1
\| \| \| \|	llvm-svn: 319704
*	AMDGPU: Use gfx9 carry-less add/sub instructions	Matt Arsenault	2017-11-30	1	-5/+13
\| \| \| \|	llvm-svn: 319491