bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Don't use struct type for argument layout	Matt Arsenault	2018-06-29	6	-701/+700
\| \| \| \| \| \| \| \| \| \|	This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999
*	[AMDGPU] Enable LICM in the BE pipeline	Stanislav Mekhanoshin	2018-06-29	8	-32/+289
\| \| \| \| \| \| \| \| \| \|	This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 llvm-svn: 335988
*	[AMDGPU] Early expansion of 32 bit udiv/urem	Stanislav Mekhanoshin	2018-06-28	3	-128/+2431
\| \| \| \| \| \| \| \| \| \| \| \|	This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 llvm-svn: 335868
*	[AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16	Stanislav Mekhanoshin	2018-06-28	2	-8/+124
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48677 llvm-svn: 335866
*	AMDGPU: Error on calls from graphics shaders	Matt Arsenault	2018-06-28	1	-0/+7
\| \| \| \| \| \| \| \|	In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829
*	AMDGPU: Fix assert on aggregate type kernel arguments	Matt Arsenault	2018-06-28	1	-0/+171
\| \| \| \| \| \| \| \| \| \|	Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827
*	[AMDGPU] Convert rcp to rcp_iflag	Stanislav Mekhanoshin	2018-06-27	5	-22/+43
\| \| \| \| \| \| \| \| \| \| \|	If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
*	[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic	Stanislav Mekhanoshin	2018-06-26	1	-0/+114
\| \| \| \| \| \| \| \|	This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654
*	AMDGPU: Add pass to lower kernel arguments to loads	Matt Arsenault	2018-06-26	126	-1539/+2828
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
*	Account for undef values from predecessors in extendSegmentsToUses	Krzysztof Parzyszek	2018-06-26	1	-0/+273
\| \| \| \| \| \| \| \|	It is legal for a PHI node not to have a live value in a predecessor as long as the end of the predecessor is jointly dominated by an undef value. llvm-svn: 335607
*	AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr	Matt Arsenault	2018-06-25	2	-0/+33
\| \| \| \| \| \| \| \| \|	Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. llvm-svn: 335490
*	StackSlotColoring: Decide colors per stack ID	Matt Arsenault	2018-06-25	1	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I thought I fixed this in r308673, but that fix was very broken. The assumption that any frame index can be used in place of another was more widespread than I realized. Even when stack slot sharing was disabled, this was still replacing frame index uses with a different ID with a different stack slot. Really fix this by doing the coloring per-stack ID, so all of the coloring logically done in a separate namespace. This is a lot simpler than trying to figure out how to change the color if the stack ID is different. llvm-svn: 335488
*	AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers	Matt Arsenault	2018-06-25	1	-7/+81
\| \| \| \|	llvm-svn: 335485
*	AMDGPU: Respect align argument parameter	Matt Arsenault	2018-06-25	1	-1/+114
\| \| \| \| \| \| \| \| \| \|	This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. llvm-svn: 335478
*	Improve handling of COPY instructions with identical value numbers	Krzysztof Parzyszek	2018-06-25	7	-0/+1005
\| \| \| \| \| \| \| \|	Testcases provided by Tim Renouf. Differential Revision: https://reviews.llvm.org/D48102 llvm-svn: 335472
*	AMDGPU: Add patterns for i32/i64 local atomic load/store	Matt Arsenault	2018-06-22	2	-0/+105
\| \| \| \| \| \| \| \|	Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
*	AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR	Tom Stellard	2018-06-22	2	-0/+108
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318
*	AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP	Tom Stellard	2018-06-22	2	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316
*	AMDGPU/GlobalISel: Implement select() for COPY	Tom Stellard	2018-06-22	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315
*	AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF	Tom Stellard	2018-06-21	1	-0/+25
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307
*	AMDGPU: Remove ability to reserve VGPRs for debugger	Konstantin Zhuravlyov	2018-06-21	2	-68/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
*	[AMDGPU] Update assembler for HSA Code Object v3	Scott Linder	2018-06-21	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281
*	DAG combine "and\|or (select c, -1, 0), x" -> "select c, x, 0\|-1"	Stanislav Mekhanoshin	2018-06-21	2	-17/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allowed folding for "and/or" binops with non-constant operand if arguments of select are 0/-1 values. Normally this code with "and" opcode does not get to a DAG combiner and simplified yet in the InstCombine. However AMDGPU produces it during lowering and InstCombine has no chance to optimize it out. In turn the same pattern with "or" opcode can reach DAG. Differential Revision: https://reviews.llvm.org/D48301 llvm-svn: 335250
*	AMDGPU: Remove old-style image intrinsics	Nicolai Haehnle	2018-06-21	11	-2104/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
*	AMDGPU: Convert test cases to the dimension-aware intrinsics	Nicolai Haehnle	2018-06-21	24	-361/+772
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also explicitly port over some tests in llvm.amdgcn.image.* that were missing. Some tests are removed because they no longer apply (i.e. explicitly testing building an address vector via insertelement). This is in preparation for the eventual removal of the old-style intrinsics. Some additional notes: - constant-address-space-32bit.ll: change some GCN-NEXT to GCN because the instruction schedule was subtly altered - insert_vector_elt.ll: the old test didn't actually test anything, because %tmp1 was not used; remove the load, because it doesn't work (Because of the amdgpu_ps calling convention? In any case, it's orthogonal to what the test claims to be testing.) Change-Id: Idfa99b6512ad139e755e82b8b89548ab08f0afcf Reviewers: arsenm, rampitec Subscribers: MatzeB, qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D48018 llvm-svn: 335229
*	AMDGPU: Select MIMG instructions manually in SITargetLowering	Nicolai Haehnle	2018-06-21	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228
*	AMDGPU: Add implicit def of SCC to kill and indirect pseudos	Nicolai Haehnle	2018-06-21	2	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 llvm-svn: 335223
*	AMDGPU: Turn D16 for MIMG instructions into a regular operand	Nicolai Haehnle	2018-06-21	2	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222
*	Generalize MergeBlockIntoPredecessor. Replace uses of ↵	Alina Sbirlea	2018-06-20	3	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183
*	Allow binop C1, (select cc, CF, CT) -> select folding	Stanislav Mekhanoshin	2018-06-20	1	-0/+182
\| \| \| \| \| \| \| \| \| \|	Previously this folding was done only if select is a first operand. However, for non-commutative operations constant may go before select. Differential Revision: https://reviews.llvm.org/D48223 llvm-svn: 335167
*	AMDGPU: Fix scalar_to_vector for v4i16/v4f16	Matt Arsenault	2018-06-20	1	-0/+33
\| \| \| \|	llvm-svn: 335161
*	Utilize new SDNode flag functionality to expand current support for fadd	Michael Berg	2018-06-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47909 llvm-svn: 334996
*	Shrink interval after moving copy in removePartialRedundancy	Krzysztof Parzyszek	2018-06-18	1	-0/+239
\| \| \| \|	llvm-svn: 334963
*	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc	Stanislav Mekhanoshin	2018-06-16	1	-0/+104
\| \| \| \| \| \| \| \| \|	This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882
*	Utilize new SDNode flag functionality to expand current support for fdiv	Michael Berg	2018-06-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47954 llvm-svn: 334862
*	AMDGPU: Add combine for short vector extract_vector_elts	Matt Arsenault	2018-06-15	3	-0/+134
\| \| \| \| \| \| \| \| \| \|	Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836
*	AMDGPU: Make v4i16/v4f16 legal	Matt Arsenault	2018-06-15	16	-175/+392
\| \| \| \| \| \| \|	Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
*	DAG: Fix creating concat_vectors with illegal type	Matt Arsenault	2018-06-15	1	-99/+110
\| \| \| \| \| \| \|	Test passes as is, but fails with future patch to make v4i16/v4f16 legal. llvm-svn: 334823
*	[AMDGPU] Recognize x & ~(-1 << y) pattern.	Roman Lebedev	2018-06-15	1	-45/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817
*	[AMDGPU] Recognize x & ((1 << y) - 1) pattern.	Roman Lebedev	2018-06-15	1	-39/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816
*	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.	Roman Lebedev	2018-06-15	1	-30/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815
*	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz	Tom Stellard	2018-06-14	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 llvm-svn: 334757
*	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL	Tom Stellard	2018-06-13	2	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 llvm-svn: 334665
*	[AMDGPU] Corrected computeKnownBits for V_PERM_B32	Stanislav Mekhanoshin	2018-06-13	1	-0/+22
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48133 llvm-svn: 334640
*	[AMDGPU] Change enqueue kernel handle type	Yaxun Liu	2018-06-13	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 llvm-svn: 334625
*	Revert "Improve handling of COPY instructions with identical value numbers"	Krzysztof Parzyszek	2018-06-13	1	-215/+0
\| \| \| \| \| \|	This reverts r334594, it breaks buildbots and fails with expensive checks. llvm-svn: 334598
*	Improve handling of COPY instructions with identical value numbers	Krzysztof Parzyszek	2018-06-13	1	-0/+215
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48102 llvm-svn: 334594
*	[AMDGPU] DAG combine to produce V_PERM_B32	Stanislav Mekhanoshin	2018-06-12	1	-0/+199
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
*	AMDHSA/NFC: Code object v3 updates (additional):	Konstantin Zhuravlyov	2018-06-12	1	-0/+10
\| \| \| \| \| \|	- Move section selection and alignment to AMDGPUAsmPrinter llvm-svn: 334521
*	AMDHSA: Code object v3 updates	Konstantin Zhuravlyov	2018-06-12	2	-22/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519