bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Invert frame index offset interpretation	Matt Arsenault	2019-06-05	1	-68/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
*	TTI: Improve default costs for addrspacecast	Matt Arsenault	2019-06-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436
*	AMDGPU: Return address lowering	Aakanksha Patil	2019-05-29	1	-0/+26
\| \| \| \| \| \| \| \|	The patch computes the return address for the current function. Differential revision: https://reviews.llvm.org/D59666 llvm-svn: 362001
*	[AMDGPU] Correct the handling of inlineasm output registers.	Michael Liao	2019-05-28	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868
*	[AMDGPU] Divergence driven ISel. Assign register class for cross block ↵	Alexander Timofeev	2019-05-26	1	-1/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741
*	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for ↵	Peter Collingbourne	2019-05-25	1	-90/+1
\| \| \| \| \| \| \| \| \| \|	cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688
*	[AMDGPU] Divergence driven ISel. Assign register class for cross block ↵	Alexander Timofeev	2019-05-24	1	-1/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644
*	AMDGPU: Correct maximum possible private allocation size	Matt Arsenault	2019-05-23	1	-15/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were assuming a much larger possible per-wave visible stack allocation than is possible: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70 Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541
*	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence	Matt Arsenault	2019-05-16	1	-0/+7
\| \| \| \| \| \| \|	The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906
*	[AMDGPU] gfx1010: use fmac instructions	Stanislav Mekhanoshin	2019-05-04	1	-3/+5
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959
*	[AMDGPU] gfx1010 loop alignment	Stanislav Mekhanoshin	2019-05-03	1	-0/+76
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935
*	[SelectionDAG] remove constant folding limitations based on FP exceptions	Sanjay Patel	2019-05-02	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791
*	[AMDGPU] gfx1010 lost VOP2 forms of some add/sub	Stanislav Mekhanoshin	2019-05-02	1	-0/+27
\| \| \| \| \| \| \| \|	Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757
*	[AMDGPU] gfx1010 MIMG implementation	Stanislav Mekhanoshin	2019-05-01	1	-15/+64
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
*	[AMDGPU] gfx1010 DS implementation	Stanislav Mekhanoshin	2019-05-01	1	-0/+11
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
*	[TargetLowering] Change getOptimalMemOpType to take a function attribute list	Sjoerd Meijer	2019-04-30	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \|	The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537
*	AMDGPU: Remove dx10-clamp from subtarget features	Matt Arsenault	2019-03-29	1	-5/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
*	[AMDGPU] Use three- and five-dword result type in image ops	Tim Renouf	2019-03-22	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some image ops return three or five dwords. Previously, we modeled that with a 4 or 8 dword register class. The register allocator could cleverly spot that some subregs were dead and allocate something else there, but that caused the de-optimization that waitcnt insertion would think that the result was used immediately. This commit allows such an image op to have a result with a three or five dword result, avoiding the above de-optimization. Differential Revision: https://reviews.llvm.org/D58905 Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b llvm-svn: 356757
*	[AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsics	Tim Renouf	2019-03-22	1	-15/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755
*	[AMDGPU] Added v5i32 and v5f32 register classes	Tim Renouf	2019-03-22	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
*	[AMDGPU] Support for v3i32/v3f32	Tim Renouf	2019-03-21	1	-8/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
*	[AMDGPU] Allow MIMG with no uses in adjustWritemask in isel	David Stuttard	2019-03-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540
*	[AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics	Ryan Taylor	2019-03-19	1	-1/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465
*	[AMDGPU] Ban i8 min3 promotion.	Neil Henning	2019-03-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I found this really weird WWM-related case whereby through the WWM transformations our isel lowering was trying to promote 2 min's into a min3 for the i8 type, which our hardware doesn't support. The new min3_i8.ll test case would previously spew the error: PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68 Before the simple fix to our isel lowering to not do it for i8 MVT's. Differential Revision: https://reviews.llvm.org/D59543 llvm-svn: 356464
*	[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers	Tim Renouf	2019-03-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
*	[AMDGPU] Add an experimental buffer fat pointer address space.	Neil Henning	2019-03-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373
*	MIR: Allow targets to serialize MachineFunctionInfo	Matt Arsenault	2019-03-14	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215
*	IR: Add immarg attribute	Matt Arsenault	2019-03-12	1	-27/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. llvm-svn: 355981
*	DAG: Don't try to cluster loads with tied inputs	Matt Arsenault	2019-03-08	1	-45/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728
*	[AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of ↵	Dmitry Preobrazhensky	2019-02-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 llvm-svn: 354969
*	[AMDGPU] Fixed hang during DAG combine	Stanislav Mekhanoshin	2019-02-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 llvm-svn: 354926
*	AMDGPU: Remove debugger related subtarget features	Matt Arsenault	2019-02-21	1	-32/+0
\| \| \| \| \| \|	As far as I know these aren't needed anymore. llvm-svn: 354634
*	[AMDGPU] fix commuted case of sub combine	Stanislav Mekhanoshin	2019-02-21	1	-5/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D58481 llvm-svn: 354543
*	[AMDGPU] Ressociate 'add (add x, y), z' to use SALU	Stanislav Mekhanoshin	2019-02-14	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \|	Reassociate adds to collect scalar operands in a single instruction when possible. That will result in a scalar add followed by vector instead of two vector adds, thus better utilizing SALU. Differential Revision: https://reviews.llvm.org/D58220 llvm-svn: 354066
*	[AMDGPU] Split dot-insts feature	Stanislav Mekhanoshin	2019-02-09	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D57971 llvm-svn: 353587
*	Implementation of asm-goto support in LLVM	Craig Topper	2019-02-08	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
*	AMDGPU/GlobalISel: Legalize addrspacecast	Matt Arsenault	2019-02-08	1	-1/+2
\| \| \| \| \| \| \|	Use a placeholder constant for now on targets that need the load from the queue ptr. llvm-svn: 353497
*	[AMDGPU] Consider XOR in waterfall loop as a terminator	Scott Linder	2019-02-05	1	-1/+1
\| \| \| \| \| \| \| \|	Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator. Differential Revision: https://reviews.llvm.org/D57703 llvm-svn: 353207
*	[AMDGPU] Support emitting GOT relocations for function calls	Scott Linder	2019-02-04	1	-23/+13
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D57416 llvm-svn: 353083
*	[AMDGPU] Fix for vector element insertion	Tim Corringham	2019-02-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Incorrect code was generated when lowering insertelement operations for vectors with 8 or 16 bit elements. The value being inserted was not adjusted for the position of the element within the 32 bit word and so only the low element within each 32 bit word could receive the intended value. Fixed by simply replicating the value to each element of a congruent vector before the mask and or operation used to update the intended element. A number of affected LIT tests have been updated appropriately. before the mask & or into the intended Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Tags: #llvm Differential Revision: https://reviews.llvm.org/D57588 llvm-svn: 352885
*	AMDGPU: Add DS append/consume intrinsics	Matt Arsenault	2019-01-28	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. llvm-svn: 352422
*	[AMDGPU] Add intrinsics for 16 bit interpolation	Tim Corringham	2019-01-28	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357
*	Codegen support for atomicrmw fadd/fsub	Matt Arsenault	2019-01-22	1	-6/+42
\| \| \| \|	llvm-svn: 351851
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	AMDGPU: Remove llvm.SI.load.const	Matt Arsenault	2019-01-18	1	-7/+0
\| \| \| \| \| \| \|	It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone llvm-svn: 351586
*	AMDGPU: Adjust the chain for loads writing to the HI part of a register.	Changpeng Fang	2019-01-16	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part of the register to maintain the appropriate order. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D56454 llvm-svn: 351379
*	AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap	Marek Olsak	2019-01-16	1	-0/+61
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
*	AMDGPU: Add a fast path for icmp.i1(src, false, NE)	Marek Olsak	2019-01-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
*	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try	David Stuttard	2019-01-14	1	-53/+289
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
*	[AMDGPU] Separate feature dot-insts	Stanislav Mekhanoshin	2019-01-10	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D56524 llvm-svn: 350793