bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Add baseline test for call waitcnt insertion	Matt Arsenault	2019-06-14	1	-0/+161
\| \| \| \|	llvm-svn: 363453
*	[AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB check	Valery Pykhtin	2019-06-14	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63337 llvm-svn: 363418
*	GlobalISel: Avoid producing Illegal copies in RegBankSelect	Matt Arsenault	2019-06-14	3	-28/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410
*	[AMDGPU] gfx1011/gfx1012 targets	Stanislav Mekhanoshin	2019-06-14	11	-24/+69
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344
*	[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32	Stanislav Mekhanoshin	2019-06-13	7	-81/+81
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339
*	[AMDGPU] gfx1010: small test change for wave32. NFC	Stanislav Mekhanoshin	2019-06-13	1	-1/+1
\| \| \| \|	llvm-svn: 363297
*	[AMDGPU] more gfx1010 tests. NFC.	Stanislav Mekhanoshin	2019-06-12	9	-136/+165
\| \| \| \|	llvm-svn: 363190
*	[AMDGPU] gfx1010 dpp16 and dpp8	Stanislav Mekhanoshin	2019-06-12	3	-35/+73
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186
*	[AMDGPU] gfx1010 premlane instructions	Stanislav Mekhanoshin	2019-06-12	2	-0/+456
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185
*	AMDGPU/GlobalISel: Fix using illegal situations in tests	Matt Arsenault	2019-06-12	2	-28/+25
\| \| \| \| \| \| \|	These were using illegal copies as the side effecting use, so make them legal. llvm-svn: 363168
*	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105)	Simon Pilgrim	2019-06-11	1	-9/+8
\| \| \| \| \| \| \| \|	Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040
*	[FastISel] Skip creating unnecessary vregs for arguments	Francis Visoiu Mistrih	2019-06-10	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963
*	[AMDGPU] Optimize image_[load\|store]_mip	Piotr Sobczak	2019-06-10	1	-0/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957
*	AMDGPU: Force skips around traps	Matt Arsenault	2019-06-07	1	-0/+58
\| \| \| \|	llvm-svn: 362852
*	AMDGPU: Fix MIR test verifier error	Matt Arsenault	2019-06-07	1	-5/+5
\| \| \| \|	llvm-svn: 362817
*	[AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a ↵	Valery Pykhtin	2019-06-07	1	-0/+33
\| \| \| \| \| \| \| \|	caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789
*	AMDGPU: Don't count mask branch pseudo towards skip threshold	Matt Arsenault	2019-06-07	1	-0/+54
\| \| \| \|	llvm-svn: 362761
*	AMDGPU: Insert skips for blocks with FLAT	Matt Arsenault	2019-06-07	1	-0/+58
\| \| \| \| \| \| \|	This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760
*	AMDGPU: Insert skip branches over return blocks	Matt Arsenault	2019-06-06	1	-0/+194
\| \| \| \| \| \| \| \| \| \|	SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754
*	[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd	Alexander Timofeev	2019-06-06	33	-179/+165
\| \| \| \| \| \| \| \| \| \| \| \|	"Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749
*	AMDGPU: Don't fix emergency stack slot at offset 0	Matt Arsenault	2019-06-05	17	-416/+485
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
*	AMDGPU: Invert frame index offset interpretation	Matt Arsenault	2019-06-05	20	-283/+374
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
*	AMDGPU: Remove amdgpu-max-work-group-size attribute	Matt Arsenault	2019-06-05	2	-2/+2
\| \| \| \| \| \| \|	This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641
*	[NFC][Codegen][AMDGPU] Autogenerate commute-shifts.ll test	Roman Lebedev	2019-06-04	1	-3/+42
\| \| \| \| \| \|	Being affected by upcoming patch llvm-svn: 362528
*	AMDGPU: Disable stack realignment for kernels	Matt Arsenault	2019-06-03	2	-0/+320
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is something of a workaround, and the state of stack realignment controls is kind of a mess. Ideally, we would be able to specify the stack is infinitely aligned on entry to a kernel. TargetFrameLowering provides multiple controls which apply at different points. The StackRealignable field is used during SelectionDAG, and for some reason distinct from this hook. StackAlignment is a single field not dependent on the function. It would probably be better to make that dependent on the calling convention, and the maximum value for kernels. Currently this doesn't really change anything, since the frame lowering mostly does its own thing. This helps avoid regressions in a future change which will rely more heavily on hasFP. llvm-svn: 362447
*	[AMDGPU] Regenerate SDIV tests for an upcoming patch	Simon Pilgrim	2019-06-01	1	-37/+2353
\| \| \| \|	llvm-svn: 362303
*	AMDGPU: Fix not adding ImplicitBufferPtr as a live-in	Matt Arsenault	2019-05-31	1	-0/+14
\| \| \| \| \| \|	Fixes missing test from r293000. llvm-svn: 362275
*	[AMDGPU] Regenerate add/sub shrink constant tests for an upcoming patch	Simon Pilgrim	2019-05-31	1	-45/+390
\| \| \| \|	llvm-svn: 362230
*	[AMDGPU] Regenerate CTLZ tests for an upcoming patch	Simon Pilgrim	2019-05-31	1	-128/+1006
\| \| \| \|	llvm-svn: 362229
*	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x ↵	Roman Lebedev	2019-05-30	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fold. Try 3 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 362144
*	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3	Roman Lebedev	2019-05-30	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 362142
*	AMDGPU/GlobalISel: Add wave scratch offset argument	Matt Arsenault	2019-05-30	1	-0/+10
\| \| \| \| \| \|	Avoids crashing in PEI in a future change. llvm-svn: 362136
*	[AMDGPU] Added target-specific attribute amdgpu-max-memory-clause	Tim Renouf	2019-05-30	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With LLPC, previous investigation has suggested that si-scheduler interacts badly with SiFormMemoryClauses on an XNACK target in some games. That needs further investigation in the future. In the meantime, this commit adds a target-specific attribute to allow us to disable SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC to use. Differential Revision: https://reviews.llvm.org/D62572 Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92 llvm-svn: 362127
*	[DAGCombine] Revert of recommit of "binop-with-const hoisting" patches	Roman Lebedev	2019-05-30	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was looking into an endless combine loop the uncommitted follow-up patch was causing, and it appears even these patches can exibit such an endless loop. The root cause is that we try to hoist one binop (add/sub) with constant operand, and if we get two such binops both of which are eligible for this hoisting, we get stuck. Some cases may highlight missing constant-folds. Reverts r361871,r361872,r361873,r361874. llvm-svn: 362109
*	AMDGPU: Return address lowering	Aakanksha Patil	2019-05-29	1	-0/+65
\| \| \| \| \| \| \| \|	The patch computes the return address for the current function. Differential revision: https://reviews.llvm.org/D59666 llvm-svn: 362001
*	AMDGPU/GlobalISel: Remove unnecesssary REQUIREs	Matt Arsenault	2019-05-29	9	-18/+3
\| \| \| \| \| \|	This has been a mandatory part of the build for a while. llvm-svn: 361956
*	AMDGPU: Temporary drop s_mul_hi_i/u32 patterns	Konstantin Zhuravlyov	2019-05-28	1	-5/+0
\| \| \| \| \| \| \| \|	It introduces performance regressions in several applications. This has already been submitted downstream. llvm-svn: 361879
*	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x ↵	Roman Lebedev	2019-05-28	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fold. Try 2 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361855, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361873
*	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2	Roman Lebedev	2019-05-28	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361871
*	[AMDGPU] Correct the handling of inlineasm output registers.	Michael Liao	2019-05-28	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868
*	Revert DAGCombine "hoist binop with const" folds	Roman Lebedev	2019-05-28	1	-6/+8
\| \| \| \| \| \| \| \| \| \|	Appear to introduce test-suite compile-time hang. http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825 This reverts r361852,r361853,r361854,r361855,r361856 llvm-svn: 361865
*	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold	Roman Lebedev	2019-05-28	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361855
*	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold	Roman Lebedev	2019-05-28	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361852
*	AMDGPU: Don't enable all lanes with non-CSR VGPR spills	Matt Arsenault	2019-05-28	1	-0/+16
\| \| \| \| \| \| \| \|	If the only VGPRs used for SGPR spilling were not CSRs, this was enabling all laness and immediately restoring exec. This is the usual situation in leaf functions. llvm-svn: 361848
*	[AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.	Michael Liao	2019-05-28	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - Don't treat the use of a scalar register as `vreg_1` an VGPR usage. Otherwise, that promotes that scalar register into vector one, which breaks the assumption that scalar register holds the lane mask. - The issue is triggered in a complicated case, where if the uses of that (lane mask) scalar register is legalized firstly before its definition, e.g., due to the mismatch block placement and its topological order or loop. In that cases, the legalization of PHI introduces the use of that scalar register as `vreg_1`. Reviewers: rampitec, nhaehnle, arsenm, alex-t Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62492 llvm-svn: 361847
*	MIR: Fix printer crashing on dead CSR frame indexes	Matt Arsenault	2019-05-28	1	-0/+28
\| \| \| \|	llvm-svn: 361819
*	[SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`.	Michael Liao	2019-05-27	2	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - The current implementation simplifies the case where the source of `copyto` is `implicit-def`ed. However, it only works when that `implicit-def` is single-used since it detects that from `implicit-def` and cannot determine which destination vreg should be used if there are multiple uses. - This patch changes that detection when `copyto` is being emitted. If that `copyto`'s source is defined from `implicit-def`, it simplifies it. Hence, it works even that `implicit-def` is multi-used. - Except it simplifies the internal IR, it won't improve the quality of code generation. However, it helps to detect 'implicit-def` in a straight-forward manner in some passes, such as `si-i1-copies`. A test case is added. Reviewers: sunfish, nhaehnle Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62342 llvm-svn: 361777
*	[AMDGPU] Divergence driven ISel. Assign register class for cross block ↵	Alexander Timofeev	2019-05-26	32	-163/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741
*	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for ↵	Peter Collingbourne	2019-05-25	32	-177/+163
\| \| \| \| \| \| \| \| \| \|	cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688
*	AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills	Matt Arsenault	2019-05-24	7	-28/+69
\| \| \| \| \| \| \|	If some lanes weren't active on entry to the function, this could clobber their VGPR values. llvm-svn: 361655