bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Update includes for intrinsic changes :(	Reid Kleckner	2018-06-23	2	-4/+4
\| \| \| \|	llvm-svn: 335409
*	[IR] Split Intrinsics.inc into enums and implementations	Reid Kleckner	2018-06-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407
*	AMDGPU: Add patterns for i32/i64 local atomic load/store	Matt Arsenault	2018-06-22	4	-1/+54
\| \| \| \| \| \| \| \|	Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
*	AMDGPU/GlobalISel: Default to using TableGen'd instruction selector	Tom Stellard	2018-06-22	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can select all instructions that are marked as legal in a full piglit run, so now is a good time to make the TableGen'd instruction selector default for all opcodes. This is NFC for a full piglit run, which is why there are no tests. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48198 llvm-svn: 335319
*	AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR	Tom Stellard	2018-06-22	4	-0/+47
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318
*	AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP	Tom Stellard	2018-06-22	4	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316
*	AMDGPU/GlobalISel: Implement select() for COPY	Tom Stellard	2018-06-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315
*	AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF	Tom Stellard	2018-06-21	2	-0/+16
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307
*	AMDGPU: Remove ability to reserve VGPRs for debugger	Konstantin Zhuravlyov	2018-06-21	6	-50/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
*	[AMDGPU] Update assembler for HSA Code Object v3	Scott Linder	2018-06-21	6	-75/+698
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281
*	[AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts	Scott Linder	2018-06-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	BlockWaitcntProcessedSet was not being cleared between calls, so it was producing incorrect counts in cases where MBB addresses happened to coincide across multiple calls. Differential Revision: https://reviews.llvm.org/D48391 llvm-svn: 335268
*	AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z	Konstantin Zhuravlyov	2018-06-21	5	-51/+0
\| \| \| \| \| \| \| \| \| \| \| \|	and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267
*	AMDGPU: Remove redundant MIMG instruction variants	Nicolai Haehnle	2018-06-21	1	-20/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For sample and gather ops, we can accurately determine the set of vaddr-size instruction variants that are required. This reduces the size of instruction tables by ~5%. The number of machine instruction opcodes is reduced from 10002 to 9476. Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48168 llvm-svn: 335232
*	AMDGPU: Remove old-style image intrinsics	Nicolai Haehnle	2018-06-21	6	-995/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
*	AMDGPU: Select MIMG instructions manually in SITargetLowering	Nicolai Haehnle	2018-06-21	8	-230/+345
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228
*	AMDGPU: Refactor MIMG instruction TableGen using generic tables	Nicolai Haehnle	2018-06-21	10	-442/+298
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227
*	AMDGPU: Use generic tables instead of SearchableTable	Nicolai Haehnle	2018-06-21	5	-22/+37
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226
*	AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM)	Nicolai Haehnle	2018-06-21	1	-69/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This will allows us to provide rich metadata about the instructions in tables that are accessible by custom C++ code. Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48011 llvm-svn: 335224
*	AMDGPU: Add implicit def of SCC to kill and indirect pseudos	Nicolai Haehnle	2018-06-21	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 llvm-svn: 335223
*	AMDGPU: Turn D16 for MIMG instructions into a regular operand	Nicolai Haehnle	2018-06-21	13	-396/+297
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222
*	AMDGPU: Fix scalar_to_vector for v4i16/v4f16	Matt Arsenault	2018-06-20	2	-3/+12
\| \| \| \|	llvm-svn: 335161
*	AMDGPU: Fix missing C++ mode comment	Matt Arsenault	2018-06-20	1	-1/+1
\| \| \| \|	llvm-svn: 335160
*	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc	Stanislav Mekhanoshin	2018-06-16	1	-17/+43
\| \| \| \| \| \| \| \| \|	This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882
*	AMDGPU: Add combine for short vector extract_vector_elts	Matt Arsenault	2018-06-15	1	-1/+42
\| \| \| \| \| \| \| \| \| \|	Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836
*	AMDGPU: Make v4i16/v4f16 legal	Matt Arsenault	2018-06-15	8	-92/+235
\| \| \| \| \| \| \|	Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
*	[AMDGPU] Recognize x & ~(-1 << y) pattern.	Roman Lebedev	2018-06-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817
*	[AMDGPU] Recognize x & ((1 << y) - 1) pattern.	Roman Lebedev	2018-06-15	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816
*	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.	Roman Lebedev	2018-06-15	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815
*	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz	Tom Stellard	2018-06-14	3	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 llvm-svn: 334757
*	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL	Tom Stellard	2018-06-13	3	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 llvm-svn: 334665
*	[AMDGPU] Corrected computeKnownBits for V_PERM_B32	Stanislav Mekhanoshin	2018-06-13	1	-7/+8
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48133 llvm-svn: 334640
*	[AMDGPU] Change enqueue kernel handle type	Yaxun Liu	2018-06-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 llvm-svn: 334625
*	[AMDGPU][MC] Enabled parsing of relocations on VALU instructions	Dmitry Preobrazhensky	2018-06-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566 Reviewers: artem.tamazov, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D47884 llvm-svn: 334622
*	[AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4	Dmitry Preobrazhensky	2018-06-13	1	-3/+19
\| \| \| \| \| \| \| \| \| \|	See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D47885 llvm-svn: 334609
*	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering	Tom Stellard	2018-06-13	4	-71/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607
*	[AMDGPU] DAG combine to produce V_PERM_B32	Stanislav Mekhanoshin	2018-06-12	5	-1/+214
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
*	AMDHSA/NFC: Code object v3 updates (additional):	Konstantin Zhuravlyov	2018-06-12	2	-13/+16
\| \| \| \| \| \|	- Move section selection and alignment to AMDGPUAsmPrinter llvm-svn: 334521
*	AMDHSA: Code object v3 updates	Konstantin Zhuravlyov	2018-06-12	6	-10/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519
*	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"'	Mark Searles	2018-06-12	1	-2/+2
\| \| \| \| \| \| \| \| \|	The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459
*	Simplify; NFC	George Burgess IV	2018-06-11	1	-1/+1
\| \| \| \| \| \|	Not shown in the diff: AQ is a `vector<SUnit >`, and SU is a `SUnit ` llvm-svn: 334451
*	AMDGPU: Add 64-bit relative variant kind	Konstantin Zhuravlyov	2018-06-11	1	-0/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D47601 llvm-svn: 334443
*	[AMDGPU] Do not consider indirect acces through phi for wave limiter	Stanislav Mekhanoshin	2018-06-11	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \|	Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420
*	[AMDGPU] Inline asm - added i16, half and i128 types support	Daniil Fukalov	2018-06-08	1	-16/+32
\| \| \| \| \| \| \| \| \| \|	AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301
*	AMDGPU: Error on LDS global address in functions	Matt Arsenault	2018-06-08	1	-1/+9
\| \| \| \| \| \| \|	These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269
*	[AMDGPU] Simplify memory legalizer (add missing virtual descructor)	Tony Tye	2018-06-08	1	-0/+4
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334257
*	[AMDGPU] Simplify memory legalizer	Tony Tye	2018-06-07	1	-234/+707
\| \| \| \| \| \| \| \| \| \|	- Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334241
*	AMDGPU: Fix not including v2f64 in SReg_128	Matt Arsenault	2018-06-07	1	-2/+2
\| \| \| \| \| \|	Fixes assertion with calls returning v2f64. llvm-svn: 334189
*	AMDGPU: Use scalar operations for f16 fabs/fneg patterns	Matt Arsenault	2018-06-07	1	-7/+7
\| \| \| \| \| \|	Fixes unnecessary differences between subtargets. llvm-svn: 334184
*	AMDGPU: Try a lot harder to emit scalar loads	Matt Arsenault	2018-06-07	3	-1/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180
*	[AMDGPU] Improve reciprocal handling	Stanislav Mekhanoshin	2018-06-06	1	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x = s; return s v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 llvm-svn: 334142