bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU/R600: Emit rodata in text segment	Jan Vesely	2020-02-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	R600 relies on this behaviour. Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since 6e18266 Differential Revision: https://reviews.llvm.org/D72991 (cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
*	[IR] Split out target specific intrinsic enums into separate headers	Reid Kleckner	2019-12-11	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
*	[AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores	Piotr Sobczak	2019-11-20	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69794
*	AMDGPU: Refactor treatment of denormal mode	Matt Arsenault	2019-11-19	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \|	Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
*	AMDGPU Reduce reported maximum group size to 1024	Matt Arsenault	2019-11-13	1	-1/+2
\| \| \| \| \|	While some targets allow encoding 2048, this was never tested or supported.
*	AMDGPU: Simplify getAddressSpace calls	Matt Arsenault	2019-10-31	1	-4/+5
\| \| \| \| \|	These can be directly taken from the GlobalValue instead of going through the type.
*	[AMDGPU] Extend buffer intrinsics with swizzling	Piotr Sobczak	2019-10-02	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
*	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in ↵	Jay Foad	2019-09-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667
*	AMDGPU: Disambiguate v3f16 format in load/store tables	Matt Arsenault	2019-08-18	1	-5/+5
\| \| \| \| \| \| \| \| \|	Currently the searchable tables report the number of dwords. These round to the same number for 3 and 4 component d16 instructions. Change this to report the number of elements so this isn't ambiguous. llvm-svn: 369202
*	[AMDGPU] Fixed occupancy calculation for gfx10	Stanislav Mekhanoshin	2019-07-19	1	-7/+11
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
*	[AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message	Dmitry Preobrazhensky	2019-07-15	1	-3/+8
\| \| \| \| \| \| \| \|	Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071
*	[AMDGPU] gfx908 mAI instructions, MC part	Stanislav Mekhanoshin	2019-07-09	1	-0/+14
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
*	[AMDGPU][MC] Corrected parsing of FLAT offset modifier	Dmitry Preobrazhensky	2019-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary of changes: - simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset; - provided information about error position for pre-gfx9 targets; - improved errors handling. Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D64244 llvm-svn: 365321
*	[AMDGPU][MC] Fix 2 for sanitizer failure in 364645	Dmitry Preobrazhensky	2019-06-28	1	-3/+3
\| \| \| \|	llvm-svn: 364656
*	[AMDGPU][MC] Enabled constant expressions as operands of sendmsg	Dmitry Preobrazhensky	2019-06-28	1	-0/+108
\| \| \| \| \| \| \| \| \| \|	See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645
*	[AMDGPU] gfx1010 wave32 metadata	Stanislav Mekhanoshin	2019-06-17	1	-0/+7
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577
*	[AMDGPU] gfx1010 base changes for wave32	Stanislav Mekhanoshin	2019-06-13	1	-8/+14
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299
*	[AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg	Dmitry Preobrazhensky	2019-06-13	1	-0/+63
\| \| \| \| \| \| \| \| \| \|	See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61125 llvm-svn: 363255
*	[AMDGPU] Optimize image_[load\|store]_mip	Piotr Sobczak	2019-06-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957
*	[AMDGPU] gfx1010 utility functions	Stanislav Mekhanoshin	2019-04-25	1	-16/+45
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61094 llvm-svn: 359224
*	[AMDGPU] Add gfx1010 target definitions	Stanislav Mekhanoshin	2019-04-24	1	-38/+60
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113
*	AMDGPU: Remove dx10-clamp from subtarget features	Matt Arsenault	2019-03-29	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
*	[AMDGPU] Added v5i32 and v5f32 register classes	Tim Renouf	2019-03-22	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
*	[AMDGPU] Support for v3i32/v3f32	Tim Renouf	2019-03-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
*	[AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, ↵	Dmitry Preobrazhensky	2019-03-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	v_readlane_b32 and v_writelane_b32 See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58713 llvm-svn: 355312
*	AMDGPU: Remove GCN features and predicates	Matt Arsenault	2019-02-08	1	-1/+2
\| \| \| \| \| \| \|	These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[AMDGPU] Extend the SI Load/Store optimizer to combine more things.	Neil Henning	2018-12-12	1	-0/+43
\| \| \| \| \| \| \| \| \| \|	I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision: https://reviews.llvm.org/D54042 llvm-svn: 348937
*	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics	Nicolai Haehnle	2018-11-30	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050
*	AMDGPU/InsertWaitcnts: Untangle some semi-global state	Nicolai Haehnle	2018-11-29	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848
*	[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST	Ron Lieberman	2018-11-16	1	-0/+1
\| \| \| \| \| \| \| \| \|	Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD\|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008
*	AMDHSA: More code object v3 fixes:	Konstantin Zhuravlyov	2018-11-15	1	-1/+2
\| \| \| \| \| \| \| \|	- Make sure IsaInfo::hasCodeObjectV3 returns true only for AMDHSA - Update assembler metadata tests to use v2 by default llvm-svn: 347001
*	Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"	Nicolai Haehnle	2018-11-07	1	-5/+2
\| \| \| \| \| \| \| \|	This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364
*	AMDGPU: Add sram-ecc feature	Konstantin Zhuravlyov	2018-11-05	1	-0/+6
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177
*	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics	Nicolai Haehnle	2018-10-17	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
*	AMDGPU: Re-apply r341982 after fixing the layering issue	Konstantin Zhuravlyov	2018-09-12	1	-138/+88
\| \| \| \| \| \| \| \| \| \| \| \|	Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069
*	Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into ↵	Ilya Biryukov	2018-09-12	1	-88/+138
\| \| \| \| \| \| \| \| \| \| \|	TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023
*	AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination	Konstantin Zhuravlyov	2018-09-11	1	-138/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982
*	AMDGPU: Remove remnants of old address space mapping	Matt Arsenault	2018-08-31	1	-23/+0
\| \| \| \|	llvm-svn: 341165
*	[AMDGPU] New buffer intrinsics	Tim Renouf	2018-08-21	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269
*	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero	Ryan Taylor	2018-08-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523
*	AMDGPU: Separate R600 and GCN TableGen files	Tom Stellard	2018-06-28	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
*	[AMDGPU] Update assembler for HSA Code Object v3	Scott Linder	2018-06-21	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281
*	AMDGPU: Select MIMG instructions manually in SITargetLowering	Nicolai Haehnle	2018-06-21	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228
*	AMDGPU: Refactor MIMG instruction TableGen using generic tables	Nicolai Haehnle	2018-06-21	1	-71/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227
*	AMDGPU: Use generic tables instead of SearchableTable	Nicolai Haehnle	2018-06-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226
*	AMDHSA: Code object v3 updates	Konstantin Zhuravlyov	2018-06-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519
*	AMDGPU : Recalculate SGPRs when trap handler is supported	Konstantin Zhuravlyov	2018-05-16	1	-5/+9
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D29911 llvm-svn: 332523
*	Remove \brief commands from doxygen comments.	Adrian Prantl	2018-05-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
*	AMDGPU: Add Vega12 and Vega20	Matt Arsenault	2018-04-30	1	-0/+4
\| \| \| \| \| \| \| \|	Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215