summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Don't run passes that aren't usefulMatt Arsenault2016-05-181-0/+5
| | | | llvm-svn: 269943
* AMDGPU: Fix assert on ttmp registersMatt Arsenault2016-05-181-2/+2
| | | | | | | | | | | Use register class that does not include them when looking for unallocated registers. This is hit by the udiv v8i64 test in the opencl integer conformance test, and takes a few seconds to compile in a debug build so no test included. llvm-svn: 269938
* AMDGPU/R600: Use correct number of vector elements when lowering private loadsJan Vesely2016-05-161-5/+3
| | | | | | | | | | Reviewer: tstellardAMD, arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D20032 llvm-svn: 269725
* AMDGPU: Fix promote alloca pass creating huge arraysMatt Arsenault2016-05-163-20/+154
| | | | | | | | | | | | | | | This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708
* AMDGPU: Unify LowerGlobalAddressJan Vesely2016-05-133-18/+5
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19794 llvm-svn: 269481
* AMDGPU/R600: Fold global address operandJan Vesely2016-05-131-0/+7
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19793 llvm-svn: 269480
* AMDGPU/R600: Implement memory loads from constant ASJan Vesely2016-05-134-56/+33
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19792 llvm-svn: 269479
* AMDGPU/R600: Add support for emitting MCExprJan Vesely2016-05-132-1/+19
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19791 llvm-svn: 269478
* AMDGPU: Add support for MCExpr to instruction printerJan Vesely2016-05-132-3/+10
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19790 llvm-svn: 269477
* AMDGPU/R600: Use machine operands instead of ints to track literalsJan Vesely2016-05-131-19/+36
| | | | | | | | | | | | This will be used for global addresses Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19789 llvm-svn: 269476
* AMDGPU/R600: There are other uses for ALU_LITERAL besides ImmJan Vesely2016-05-131-3/+6
| | | | | | | | | | | | This will be used for GV Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19788 llvm-svn: 269475
* AMDGPU: Make CONST_DATA_PTR available to R600Jan Vesely2016-05-133-6/+6
| | | | | | | | | | | | Rename to AMDGPUconstdata_ptr Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19786 llvm-svn: 269474
* AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2)Jan Vesely2016-05-133-1/+53
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19785 llvm-svn: 269473
* [AMDGPU] Update nop insertion for debugger usageKonstantin Zhuravlyov2016-05-132-36/+15
| | | | | | | | | - Insert one nop for each high level statement instead of two - Do not insert nop before prologue Differential Revision: http://reviews.llvm.org/D20215 llvm-svn: 269452
* AMDGPU: Remove verifier check for scc live insMatt Arsenault2016-05-131-10/+0
| | | | | | | | | | We only really need this to be true for SIFixSGPRCopies. I'm not sure there's any way this could happen before that point. Fixes a case where MachineCSE could introduce a cross block scc use. llvm-svn: 269391
* SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISelJustin Bogner2016-05-121-49/+67
| | | | | | | | | | | - Where we were returning a node before, call ReplaceNode instead. - Where we would return null to fall back to another selector, rename the method to try* and return a bool for success. - Where we were calling SelectNodeTo, just return afterwards. Part of llvm.org/pr26808. llvm-svn: 269349
* AMDGPU: Fix getIntegerAttribute type and error messageMatt Arsenault2016-05-122-4/+6
| | | | llvm-svn: 269268
* AMDGPU: Fix breaking IR on instructions with multiple pointer operandsMatt Arsenault2016-05-121-8/+91
| | | | | | | | | | | | | The promote alloca pass would attempt to promote an alloca with a select, icmp, or phi user, even though the other operand was from a non-promotable source, producing a select on two different pointer types. Only do this if we know that both operands derive from the same alloca. In the future we should be able to relax this to an alloca which will also be promoted. llvm-svn: 269265
* AMDGPU: Make some instructions convergentMatt Arsenault2016-05-111-1/+7
| | | | llvm-svn: 269147
* AMDGPU: Change private_element_size to 4Matt Arsenault2016-05-111-1/+1
| | | | llvm-svn: 269145
* Cloning: Clean up the interface to the CloneFunction function.Peter Collingbourne2016-05-101-2/+1
| | | | | | | | | | | | | | | | | | | | | Remove the ModuleLevelChanges argument, and the ability to create new subprograms for cloned functions. The latter was added without review in r203662, but it has no in-tree clients (all non-test callers pass false for ModuleLevelChanges [1], so it isn't reachable outside of tests). It also isn't clear that adding a duplicate subprogram to the compile unit is always the right thing to do when cloning a function within a module. If this functionality comes back it should be accompanied with a more concrete use case. Furthermore, all in-tree clients add the returned function to the module. Since that's pretty much the only sensible thing you can do with the function, just do that in CloneFunction. [1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction Differential Revision: http://reviews.llvm.org/D18628 llvm-svn: 269110
* [AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNopsKonstantin Zhuravlyov2016-05-104-19/+21
| | | | | | Differential Revision: http://reviews.llvm.org/D20117 llvm-svn: 269098
* CodeGen: Move TargetPassConfig from Passes.h to an own header; NFCMatthias Braun2016-05-101-1/+2
| | | | | | | | Many files include Passes.h but only a fraction needs to know about the TargetPassConfig class. Move it into an own header. Also rename Passes.cpp to TargetPassConfig.cpp while we are at it. llvm-svn: 269011
* Fixed unused but set variable warningSimon Pilgrim2016-05-091-3/+0
| | | | llvm-svn: 268931
* AMDGPU: Fold shift into cvt_f32_ubyteNMatt Arsenault2016-05-091-1/+15
| | | | llvm-svn: 268930
* [AMDGPU][llvm-mc] Some refactoring of .td filesArtem Tamazov2016-05-062-27/+27
| | | | | | | | | Some custom Operands and AsmOperandClasses moved to proper place. No functional changes. Differential Revision: http://reviews.llvm.org/D20012 llvm-svn: 268780
* [AMDGPU][llvm-mc] Add support for sendmsg(...) syntax.Artem Tamazov2016-05-065-27/+379
| | | | | | | | | | | | | | | | | | | Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax in s_sendmsg and s_sendmsghalt instructions. The syntax matches the SP3 assembler/disassembler rules. That is why implicit inputs (like M0 and EXEC) are not printed to disassembly output anymore. sendmsg(...) allows only known message types and attributes, even if literals are used instead of symbolic names. However, raw literal (without "sendmsg") still can be used, and that allows for any 16-bit value. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19596 llvm-svn: 268762
* Revert "AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2."Nikolay Haustov2016-05-062-9/+5
| | | | | | | | This reverts commit 47486d52454d60cdf6becc0b2efe533c73794380. It broke calling OpenCL kernel from another kernel. llvm-svn: 268739
* [TableGen] AsmMatcher: support for default values for optional operandsSam Kolton2016-05-064-106/+118
| | | | | | | | | | | | | | Summary: This change allows to specify "DefaultMethod" for optional operand (IsOptional = 1) in AsmOperandClass that return default value for operand. This is used in convertToMCInst to set default values in MCInst. Previously if you wanted to set default value for operand you had to create custom converter method. With this change it is possible to use standard converters even when optional operands presented. Reviewers: tstellarAMD, ab, craig.topper Subscribers: jyknight, dsanders, arsenm, nhaustov, llvm-commits Differential Revision: http://reviews.llvm.org/D18242 llvm-svn: 268726
* AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.Nikolay Haustov2016-05-062-5/+9
| | | | | | | | | | | | | | | | | Summary: Check calling convention in AMDGPUMachineFunction::isKernel This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF. Also, in the future unused non-kernels may be optimized. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19917 llvm-svn: 268719
* SDAG: Rename Select->SelectImpl and repurpose Select as returning voidJustin Bogner2016-05-051-2/+2
| | | | | | | | | | | | | | This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693
* AMDGPU: Simplify control flow / conditionsMatt Arsenault2016-05-051-19/+19
| | | | llvm-svn: 268676
* AMDGPU: Uniform branch conditions can originate with intrinsicsNicolai Haehnle2016-05-051-2/+1
| | | | | | | | | | | | | | | | | Summary: Discovered by Dave Airlie, fixes an assertion in Khronos OpenGL CTS GL43-CTS.shader_storage_buffer_object.advanced-matrix. In this particular case, the buffer load intrinsic fed into a uniform conditional branch, and led the brcond lowering down the wrong path. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19931 llvm-svn: 268650
* AMDGPU/SI: Add support for AMD code object version 2.Tom Stellard2016-05-058-133/+6
| | | | | | | | | | | | | | Summary: Version 2 is now the default. If you want to emit version 1, use the amdgcn--amdhsa-amdcov1 triple. Reviewers: arsenm, kzhuravl Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19283 llvm-svn: 268647
* AMDGPU/R600: Minor cleanup in InstrInfoJan Vesely2016-05-041-17/+16
| | | | | | | | | | | | | | Use std::make_pair instead of constructor Use C++11 loop Reuse helper var Reviewers: tstellardAMD Subsribers: arsenm Differential Revision: http://reviews.llvm.org/D19787 llvm-svn: 268503
* AMDGPU/SI: Use range loops to simplify some code in the SI SchedulerTom Stellard2016-05-031-18/+18
| | | | | | | | | | Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19822 llvm-svn: 268396
* Silence unused variable warning; NFC.Aaron Ballman2016-05-031-2/+1
| | | | llvm-svn: 268392
* AMDGPU: Custom lower v2i32 loads and storesMatt Arsenault2016-05-021-7/+39
| | | | | | | This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296
* AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratchTom Stellard2016-05-021-2/+1
| | | | | | | | | We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-022-55/+12
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* Fix instance of -Winconsistent-missing-override in AMDGPU codeReid Kleckner2016-05-021-1/+1
| | | | llvm-svn: 268289
* AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratchTom Stellard2016-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: When we restore an SGPR value from scratch, we first load it into a temporary VGPR and then use v_readlane_b32 to copy the value from the VGPR back into an SGPR. We weren't setting the kill flag on the VGPR in the v_readlane_b32 instruction, so the register scavenger wasn't able to re-use this temp value later. I wasn't able to create a lit test for this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19744 llvm-svn: 268287
* AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cppTom Stellard2016-05-023-39/+51
| | | | | | | | | | Reviewers: arsenm Subscribers: jvesely, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19736 llvm-svn: 268267
* AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260Tom Stellard2016-05-021-1/+2
| | | | | | | We can't use MI->getDebugLoc() when MI is an iterator that could be MBB.end(). llvm-svn: 268265
* AMDGPU/SI: Use the hazard recognizer to break SMEM soft clausesTom Stellard2016-05-023-4/+72
| | | | | | | | | | | | | | | Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260
* AMDGPU: llvm.SI.fs.constant is a source of divergenceNicolai Haehnle2016-05-021-0/+1
| | | | | | | | | | | | | | | | Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 llvm-svn: 268259
* AMDGPU/SI: Use hazard recognizer to detect DPP hazardsTom Stellard2016-05-023-55/+27
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247
* Silence unused variable warnings; NFC.Aaron Ballman2016-05-021-9/+4
| | | | llvm-svn: 268234
* Add missing override.Rafael Espindola2016-04-301-1/+2
| | | | llvm-svn: 268163
* AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaitsTom Stellard2016-04-301-6/+0
| | | | | | This was supposed to be part of r268143. llvm-svn: 268154
OpenPOWER on IntegriCloud