bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Rename ExpandISelPseudo->FinalizeISel, delay register reservation	Matt Arsenault	2019-06-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757
*	[AMDGPU] Propagate function attributes thru bitcasts	Stanislav Mekhanoshin	2019-06-17	1	-3/+4
\| \| \| \| \| \| \| \| \|	AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614
*	[AMDGPU] gfx1010 wavefrontsize intrinsic folding	Stanislav Mekhanoshin	2019-06-17	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588
*	[AMDGPU] Pass to propagate ABI attributes from kernels to the functions	Stanislav Mekhanoshin	2019-06-17	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586
*	Revert CMake: Make most target symbols hidden by default	Tom Stellard	2019-06-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028
*	CMake: Make most target symbols hidden by default	Tom Stellard	2019-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990
*	[AMDGPU] Create a TargetInfo header. NFC	Richard Trieu	2019-05-14	1	-0/+1
\| \| \| \| \| \| \| \|	Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360713
*	[AMDGPU] gfx1010 GCNRegBankReassign pass	Stanislav Mekhanoshin	2019-05-01	1	-0/+2
\| \| \| \| \| \| \| \|	Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704
*	[AMDGPU] gfx1010 GCNNSAReassign pass	Stanislav Mekhanoshin	2019-05-01	1	-0/+15
\| \| \| \| \| \| \| \|	Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700
*	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define ↵	Amara Emerson	2019-04-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368
*	Reapply [ValueTracking] Support min/max selects in computeConstantRange()	Nikita Popov	2019-04-07	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This fixes an infinite InstCombine loop, with the test case taken from D59378. Relative to the previous iteration, this contains some adjustments for AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen, which ends up constant folding some existing med3 tests after this change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option is added, which allows disabling scalar IR passes (that use InstSimplify) for testing purposes. Differential Revision: https://reviews.llvm.org/D59506 llvm-svn: 357870
*	[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs	Stanislav Mekhanoshin	2019-04-05	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Detect dead lanes can create some dead defs. Then RenameIndependentSubregs will break a REG_SEQUENCE which may use these dead defs. At this point a dead instruction can be removed but we do not run a DCE anymore. MachineDCE was only running before live variable analysis. The patch adds a mean to preserve LiveIntervals and SlotIndexes in case it works past this. Differential Revision: https://reviews.llvm.org/D59626 llvm-svn: 357805
*	[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.	Neil Henning	2019-04-01	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400
*	AMDGPU: Add additional MIR tests for exec mask optimizations	Matt Arsenault	2019-03-27	1	-3/+11
\| \| \| \| \| \| \| \| \| \|	Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091
*	CodeGen: Refactor regallocator command line and target selection	Matt Arsenault	2019-03-19	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. llvm-svn: 356506
*	[AMDGPU] Add an experimental buffer fat pointer address space.	Neil Henning	2019-03-18	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373
*	AMDGPU: Partially fix default device for HSA	Matt Arsenault	2019-03-17	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347
*	MIR: Allow targets to serialize MachineFunctionInfo	Matt Arsenault	2019-03-14	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215
*	AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)	Aakanksha Patil	2019-03-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 llvm-svn: 355574
*	AMDGPU: Fix typo	Matt Arsenault	2019-02-28	1	-1/+1
\| \| \| \|	llvm-svn: 355056
*	AMDGPU: Enable function calls by default	Matt Arsenault	2019-02-28	1	-4/+9
\| \| \| \| \| \| \|	Fixes some crashes on illegal call situations which are unfortunately still valid IR. llvm-svn: 355051
*	AMDGPU: Remove debugger related subtarget features	Matt Arsenault	2019-02-21	1	-2/+0
\| \| \| \| \| \|	As far as I know these aren't needed anymore. llvm-svn: 354634
*	[AMDGPU] Enable DPP combiner pass by default.	Valery Pykhtin	2019-02-11	1	-1/+1
\| \| \| \| \| \|	Related revisions: https://reviews.llvm.org/D55444, https://reviews.llvm.org/D55314 llvm-svn: 353691
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try	David Stuttard	2019-01-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
*	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute	Aakanksha Patil	2018-12-13	1	-3/+1
\| \| \| \| \| \|	This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084
*	[AMDGPU] Support for "uniform-work-group-size" attribute	Aakanksha Patil	2018-12-12	1	-1/+3
\| \| \| \| \| \| \| \|	Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 llvm-svn: 348971
*	[AMDGPU] Add new Mode Register pass	Tim Corringham	2018-12-10	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754
*	[Targets] Add errors for tiny and kernel codemodel on targets that don't ↵	David Green	2018-12-07	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \|	support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 llvm-svn: 348585
*	[AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default	Valery Pykhtin	2018-12-06	1	-1/+1
\| \| \| \| \| \| \| \|	Turn the combiner back off as there're failures until the issue is fixed. Differential revision: https://reviews.llvm.org/D55314 llvm-svn: 348487
*	[AMDGPU]: Turn on the DPP combiner by default	Valery Pykhtin	2018-12-05	1	-1/+1
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D55314 llvm-svn: 348371
*	[AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)	Valery Pykhtin	2018-11-30	1	-0/+8
\| \| \| \| \| \| \| \|	Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993
*	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"	David Stuttard	2018-11-29	1	-1/+0
\| \| \| \| \| \| \| \| \|	Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
*	Add support for TFE/LWE in image intrinsics	David Stuttard	2018-11-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
*	AMDGPU: Don't optimize exec masks at -O0	Matt Arsenault	2018-11-26	1	-1/+2
\| \| \| \|	llvm-svn: 347573
*	[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST	Ron Lieberman	2018-11-16	1	-0/+2
\| \| \| \| \| \| \| \| \|	Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD\|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008
*	Allow subclassing ExternalAA	Matt Arsenault	2018-11-07	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353
*	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU	Nicolai Haehnle	2018-10-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
*	[AMDGPU] Add a pass to promote bitcast calls	Scott Linder	2018-10-26	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 llvm-svn: 345382
*	[AMDGPU] Add an AMDGPU specific atomic optimizer.	Neil Henning	2018-10-08	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973
*	AMDGPU: Always run AMDGPUAlwaysInline	Matt Arsenault	2018-10-03	1	-12/+9
\| \| \| \| \| \| \|	Even if calls are enabled, it still needs to be run for forcing inline of functions that use LDS. llvm-svn: 343657
*	AMDGPU: Expand atomicrmw nand in IR	Matt Arsenault	2018-10-02	1	-0/+1
\| \| \| \|	llvm-svn: 343559
*	[AMDGPU] restore r342722 which was reverted with r342743	Sameer Sahasrabuddhe	2018-09-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. llvm-svn: 342956
*	revert changes from r342722	Sameer Sahasrabuddhe	2018-09-21	1	-2/+0
\| \| \| \| \| \| \| \| \|	"[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 llvm-svn: 342743
*	[AMDGPU] lower-switch in preISel as a workaround for legacy DA	Sameer Sahasrabuddhe	2018-09-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll Differential Revision: https://reviews.llvm.org/D52221 llvm-svn: 342722
*	AMDGPU: Stop forcing internalize at -O0	Matt Arsenault	2018-08-31	1	-11/+0
\| \| \| \| \| \| \|	This doesn't really matter if clang is always emitting the visibility as hidden by default. llvm-svn: 341168
*	AMDGPU: Remove remnants of old address space mapping	Matt Arsenault	2018-08-31	1	-1/+0
\| \| \| \|	llvm-svn: 341165
*	run post-RA hazard recognizer pass late	Mark Searles	2018-07-16	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154
*	AMDGPU: Refactor Subtarget classes	Tom Stellard	2018-07-11	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
*	Reapply "AMDGPU: Force inlining if LDS global address is used"	Matt Arsenault	2018-07-10	1	-5/+7
\| \| \| \| \| \|	This reverts commit r336623 llvm-svn: 336675