summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Register some more passes so -print-before worksMatt Arsenault2015-10-121-0/+2
| | | | llvm-svn: 250071
* CodeGen: print and verify after TargetPassConfig::insertPass by defaultJustin Bogner2015-10-081-1/+3
| | | | | | | | | | | | | | In r224059, we started verifying after addPass, but missed doing so on insertPass. There isn't a good reason for the discrepancy, and skipping the verifier in these cases causes bugs. This also exposes a verifier error that was introduced in r249087, but the verifier doesn't run until after the register coalescer, when the issue happens to have been resolved. I've skipped the verifier after SIFixSGPRLiveRangesID to avoid the failures for now and will follow up with Matt for a proper fix. llvm-svn: 249643
* AMDGPU: Fix missing implicit m0 uses on movrel instructionsMatt Arsenault2015-10-071-0/+7
| | | | llvm-svn: 249577
* AMDGPU: Add comment for VOP2b operand classMatt Arsenault2015-10-071-0/+5
| | | | | | | | Because of the constant bus requirement, it is never legal to use a literal constant for these instructions despite the encoding allowing it. This was already doing the right thing, but note why. llvm-svn: 249500
* AMDGPU: Properly register passesMatt Arsenault2015-10-071-2/+2
| | | | llvm-svn: 249495
* AMDGPU: Use explicit register size indirect pseudosMatt Arsenault2015-10-073-17/+28
| | | | | | | | | | | | | | | | | This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494
* AMDGPU: Remove inferRegClassFromUses / inferRegClassFromDefsMatt Arsenault2015-10-071-70/+0
| | | | | | | | | I'm not sure why this would be necessary, and no tests fail with them removed. Looking at the uses is suspect as well because the use reg classes will likely change when the users are moved as a result of moving this instruction. llvm-svn: 249493
* AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments()Tom Stellard2015-10-061-1/+1
| | | | | | | | | | | | | | Summary: We currently ignore the calling convention, so there is no real reason to assert on the calling convention of functions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13367 llvm-svn: 249468
* AMDGPU/SI: Add 64-bit versions of v_nop and v_clrexcpTom Stellard2015-10-065-25/+56
| | | | | | | | | | | | | | Summary: The assembly printing of these is still missing the encoding size suffix, but this will be fixed in a later commit. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13436 llvm-svn: 249424
* AMDGPU/SI: Add a helper for creating aliases for the _e32 instructionsTom Stellard2015-10-051-11/+49
| | | | | | | | | | | | | | | | | | Summary: We are currently only using these aliases for VOPC instructions, but this helper will make it easier to use them everywhere. These aliases allow for the automatic matching of instructions with forced 32-bit encoding. Eventually, we should be able to remove the custom C++ logic we have for this in the assembler. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13396 llvm-svn: 249330
* AMDGPU/SI: Remove unused tablegen multiclassTom Stellard2015-10-031-16/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13395 llvm-svn: 249221
* AMDGPU/SI: Add verifier check for exec readsMatt Arsenault2015-10-022-2/+14
| | | | | | | Make sure we aren't accidentally not setting these in the instruction definitions. llvm-svn: 249170
* AMDGPU: Fix unused variable warning in release buildMatt Arsenault2015-10-011-2/+2
| | | | llvm-svn: 249091
* AMDGPU: Move SIFixSGPRLiveRanges to be a regalloc passMatt Arsenault2015-10-012-33/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace LiveInterval usage with LiveVariables. LiveIntervals computes far more information than is needed for this pass which just needs to find if an SGPR is live out of the defining block. LiveIntervals are not usually available that early, requiring computing them twice which is very expensive. The extra run of LiveIntervals/LiveVariables/SlotIndexes was costing in total about 5% of compile time. Continuing to use LiveIntervals is problematic. It seems there is an option (early-live-intervals) to run the analysis about where it should go to avoid recomputing LiveVariables, but it seems to be completely broken with subreg liveness enabled. There are also problems from trying to recompute LiveIntervals since this seems to undo LiveVariables and clearing kill flags, causing TwoAddressInstructions to make bad decisions. Insert the pass right after live variables and preserve it. The tricky case to worry about might be phis since LiveVariables doesn't count a register as live out if in the successor block it is only used in a phi, but I don't think this is a concern right now because SIFixSGPRCopies replaces SGPR phis. llvm-svn: 249087
* AMDGPU: Merge if and switchMatt Arsenault2015-10-011-14/+17
| | | | llvm-svn: 249082
* AMDGPU: Remove dead codeMatt Arsenault2015-10-011-7/+4
| | | | | | | There's no point in checking VReg_1 because all uses of it should already have been removed by SILowerI1Copies. llvm-svn: 249081
* AMDGPU: Make SIInsertWaits about a factor of 4 fasterMatt Arsenault2015-10-012-23/+24
| | | | | | | | | | | | | | | | | | This was the slowest target custom pass and was spending 80% of the time in getMinimalPhysRegClass which was called for every register operand. Try to use the statically known register class when possible from the instruction's MCOperandInfo. There are a few pseudo instructions which are not well behaved with unknown register classes which still require the expensive physical register class search. There are a few other possibilities for making this even faster, such as not inspecting implicit operands. For now those are checked because it is technically possible to have a scalar load into exec or vcc which can be implicitly used. llvm-svn: 249079
* AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering passTom Stellard2015-10-011-2/+6
| | | | | | | | | | | | | | | Summary: Instead of asserting when the kernel metadata is different than we expect, we should just skip lowering that function. This fixes assertion failures with OpenCL argument metadata from older LLVM releases. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13356 llvm-svn: 249073
* AMDGPU: Add MEM_RAT STORE_TYPED.Tom Stellard2015-10-013-0/+23
| | | | | | | | | | | | v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042
* AMDGPU: Factor out EOP query.Tom Stellard2015-10-011-4/+6
| | | | | | | | v2: Fix brace placement and capitalization (Matt). Patch by: Zoltan Gilian llvm-svn: 249041
* AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init orderTom Stellard2015-10-011-9/+12
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12451 llvm-svn: 248978
* AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is setMarek Olsak2015-09-294-8/+18
| | | | | | | | | | to prevent setting a huge stride, because DATA_FORMAT has a different meaning if ADD_TID_ENABLE is set. This is a candidate for stable llvm 3.7. Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 248858
* AMDGPU: Factor switch into separate functionMatt Arsenault2015-09-282-21/+30
| | | | llvm-svn: 248742
* AMDGPU: Fix splitting x16 SMRD loadsMatt Arsenault2015-09-281-2/+2
| | | | | | | | When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741
* AMDGPU: Fix moving SMRD loads with literal offsets on CIMatt Arsenault2015-09-281-3/+9
| | | | llvm-svn: 248740
* AMDGPU: Fix splitting SMRD with large offsetMatt Arsenault2015-09-281-1/+1
| | | | | | | | | | | | | The splitting of > 4 dword SMRD instructions if using an offset in an SGPR instead of an immediate was not setting the destination register, resulting an an instruction missing an operand which would assert later. Test will be included in a following commit which fixes a related issue. llvm-svn: 248739
* Improved the interface of methods commuting operands, improved X86-FMA3 ↵Andrew Kaylor2015-09-283-21/+55
| | | | | | | | | | mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 llvm-svn: 248735
* AMDGPU: Remove hasPostISelHook from most instructionsMatt Arsenault2015-09-262-12/+19
| | | | | | | Since this is only needed for VOP3 and a few other special case instructions, stop setting it on everything. llvm-svn: 248657
* AMDGPU: Switch over reg class size instead of checking all super classesMatt Arsenault2015-09-262-22/+36
| | | | | | This gets isSGPRClass out of my profile of SIFixSGPRCopies. llvm-svn: 248656
* AMDGPU: Don't handle invalid reg classes in helper functionsMatt Arsenault2015-09-261-6/+0
| | | | | | | No tests hit these and it would be better to have checks like this explicit where they are used. llvm-svn: 248655
* AMDGPU: address -Winconsistent-missing-overrideSaleem Abdulrasool2015-09-261-1/+1
| | | | | | Add missing override. NFC. llvm-svn: 248652
* AMDGPU: Set CopyCost of register classesMatt Arsenault2015-09-261-8/+32
| | | | | | | | These require multiple mov instructions to copy, but the default value is that 1 instruction is needed. I'm not sure if this actually changes anything. llvm-svn: 248651
* AMDGPU: VOP3b definition cleanupsMatt Arsenault2015-09-262-18/+25
| | | | llvm-svn: 248647
* AMDGPU: Fix sched model for VOP2b instructionsMatt Arsenault2015-09-261-6/+7
| | | | | | | | Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646
* AMDGPU: Construct new buffer instruction when moving SMRDMatt Arsenault2015-09-252-31/+39
| | | | | | | | | It's easier to understand creating a full instruction than the current situation where sometimes a new instruction is created and sometimes it is awkwardly mutated in place. llvm-svn: 248627
* AMDGPU/SI: Use .hsatext section instead of .text for HSATom Stellard2015-09-2517-9/+197
| | | | | | | | | | Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619
* AMDGPU: Make getNamedOperandIdx declaration readonlyMatt Arsenault2015-09-252-0/+3
| | | | | | This matches how it is defined in the generated implementation. llvm-svn: 248598
* AMDGPU: Disable some passes that are not meaningfulMatt Arsenault2015-09-251-3/+15
| | | | | | | | | | | Don't run passes related to stack maps, garbage collection, exceptions since these aren't useful for GPUs. There might be a few more to turn off that I'm less sure about (e.g. ShrinkWrapping) or I'm not sure how to disable (SafeStack and StackProtector) llvm-svn: 248591
* AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAGMatt Arsenault2015-09-251-52/+61
| | | | | | | | | | | | | This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589
* AMDGPU: Fix recomputing dominator tree unnecessarilyMatt Arsenault2015-09-256-1/+19
| | | | | | | SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587
* AMDGPU: Re-justify workaround and fix worked around problemMatt Arsenault2015-09-252-56/+65
| | | | | | | | | | | | | | | When buffer resource descriptors were built, the upper two components of the descriptor were first composed into a 64-bit register because legalizeOperands assumed all operands had the same register class. Fix that problem, but keep the workaround. I'm not sure anything actually is actually emitting such a REG_SEQUENCE now. If multiple resource descriptors are set up with different base pointers, this is copied with a single s_mov_b64. We probably should fix this better by recognizing a pair of s_mov_b32 later, but for now delete the dead code. llvm-svn: 248585
* AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sourcesMatt Arsenault2015-09-251-6/+15
| | | | | | This avoids needting to re-legalize the new REG_SEQUENCE. llvm-svn: 248584
* AMDGPU: Fix not adding exec to defs of cmpx instruction pseudosMatt Arsenault2015-09-251-0/+2
| | | | | | | | | This was only set on the final _si/_vi version, but not on the pseudos most of codegen sees. No test since these instructions aren't used yet. llvm-svn: 248583
* AMDGPU: Improve accuracy of instruction rates for VOPCMatt Arsenault2015-09-252-48/+73
| | | | | | | | | | | These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582
* AMDGPU: Remove unused includesMatt Arsenault2015-09-251-6/+0
| | | | llvm-svn: 248553
* AMDGPU: Add s_dcache_* instructionsMatt Arsenault2015-09-245-14/+69
| | | | llvm-svn: 248533
* AMDGPU: Add cache invalidation instructions.Matt Arsenault2015-09-243-4/+34
| | | | | | | | | | These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532
* Introduce target hook for optimizing register copiesMatt Arsenault2015-09-242-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
* AMDGPU: Return after instruction is processed.Matt Arsenault2015-09-241-0/+4
| | | | llvm-svn: 248476
* AMDGPU: Remove another unnecessary check from commuteInstructionMatt Arsenault2015-09-241-5/+0
| | | | llvm-svn: 248475
OpenPOWER on IntegriCloud