summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/SI: Refactor VOP[12C] tablegen definitionsTom Stellard2015-11-062-97/+75
| | | | | | | | | | | | | | Summary: Pass the VOPProfile object all the through to *_m multiclasses. This will allow us to do more simplifications in the future. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13437 llvm-svn: 252339
* AMDGPU: Cleanup includesMatt Arsenault2015-11-062-6/+4
| | | | llvm-svn: 252328
* AMDGPU: Create emergency stack slots during frame loweringMatt Arsenault2015-11-067-14/+89
| | | | | | Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
* AMDGPU: Remove unused scratch resource operandsMatt Arsenault2015-11-062-75/+131
| | | | | | The SGPR spill pseudos don't actually use them. llvm-svn: 252324
* AMDGPU: Add pass to detect used kernel featuresMatt Arsenault2015-11-064-0/+138
| | | | | | | | | | | Mark kernels that use certain features that require user SGPRs to support with kernel attributes. We need to know before instruction selection begins because it impacts the kernel calling convention lowering. For now this only detects the workitem intrinsics. llvm-svn: 252323
* AMDGPU: Fix hardcoded alignment of spill.Matt Arsenault2015-11-062-13/+12
| | | | | | | Instead of forcing 4 alignment when spilled, set register class alignments. llvm-svn: 252322
* AMDGPU: Hack for VS_32 register pressureMatt Arsenault2015-11-062-4/+17
| | | | | | | | | | | | | For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
* AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNELTom Stellard2015-11-066-0/+60
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291
* AMDGPU: Also track whether SGPRs were spilledMatt Arsenault2015-11-053-2/+20
| | | | llvm-svn: 252145
* AMDGPU: Print number user SGPRsMatt Arsenault2015-11-051-0/+6
| | | | | | | This doesn't quite match how SC prints it, which doesn't put it in a comment. llvm-svn: 252144
* AMDGPU: Disallow s[102:103] on VI in assemblerMatt Arsenault2015-11-051-2/+28
| | | | llvm-svn: 252142
* AMDGPU: Fix assert when legalizing atomic operandsMatt Arsenault2015-11-053-15/+59
| | | | | | | | | | The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140
* AMDGPU: Make addr64 atomic operand order consistentMatt Arsenault2015-11-051-2/+2
| | | | | | | vaddr comes before srsrc in every other MUBUF instruction, and is the order it is printed. llvm-svn: 252139
* AMDGPU: Fix typoMatt Arsenault2015-11-051-2/+2
| | | | llvm-svn: 252116
* AMDGPU: Make flat_scratch name consistentMatt Arsenault2015-11-031-3/+3
| | | | | | | The printed name and the parsed assembler names weren't the same. I'm not sure which name SC prints these as, but I think it's this one. llvm-svn: 252010
* AMDGPU: Fix asserts on invalid register rangesMatt Arsenault2015-11-031-5/+13
| | | | | | | | | If the requested SGPR was not actually aligned, it was accepted and rounded down instead of rejected. Also fix an assert if the range is an invalid size. llvm-svn: 252009
* AMDGPU: Fix off by one error in register parsingMatt Arsenault2015-11-031-4/+5
| | | | | | If trying to use one past the end, this would assert. llvm-svn: 252008
* AMDGPU: s[102:103] is unavailable on VIMatt Arsenault2015-11-031-1/+10
| | | | llvm-svn: 252000
* AMDGPU: Define correct number of SGPRsMatt Arsenault2015-11-032-6/+10
| | | | | | | | | There are actually 104 so 2 were missing. More assembler tests with high register number tuples will be included in later patches. llvm-svn: 251999
* AMDGPU: Make findUsedSGPR more readableMatt Arsenault2015-11-031-7/+18
| | | | | | Add more comments etc. llvm-svn: 251996
* AMDGPU: Initialize SIFixSGPRCopies so -print-after worksMatt Arsenault2015-11-033-8/+15
| | | | llvm-svn: 251995
* AMDGPU: Alphabetize includesMatt Arsenault2015-11-031-1/+1
| | | | llvm-svn: 251994
* ScheduleDAGInstrs: Remove IsPostRA flag; NFCMatthias Braun2015-11-031-2/+1
| | | | | | | | | | | | | | | | | ScheduleDAGInstrs doesn't behave differently before or after register allocation. It was only used in a method of MachineSchedulerBase which behaved differently in MachineScheduler/PostMachineScheduler. Change this to let MachineScheduler/PostMachineScheduler just pass in a parameter to that function. The order of the LiveIntervals* and bool RemoveKillFlags paramters have been switched to make out-of-tree code fail instead of unintentionally passing a value intended for the IsPostRA flag to the (previously following and default initialized) RemoveKillFlags. Differential Revision: http://reviews.llvm.org/D14245 llvm-svn: 251883
* AMDGPU: Stop assuming vreg for build_vectorMatt Arsenault2015-11-022-20/+40
| | | | | | | | | | | | | This was causing a variety of test failures when v2i64 is added as a legal type. SIFixSGPRCopies should correctly handle the case of vector inputs to a scalar reg_sequence, so this isn't necessary anymore. This was hiding some deficiencies in how reg_sequence is handled later, but this shouldn't be a problem anymore since the register class copy of a reg_sequence is now done before the reg_sequence. llvm-svn: 251860
* AMDGPU: Error on graphics shaders with HSAMatt Arsenault2015-11-021-0/+8
| | | | | | | | I've found myself pointlessly debugging problems from running graphics tests with an HSA triple a few times, so stop this from happening again. llvm-svn: 251858
* AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCEMatt Arsenault2015-11-021-23/+89
| | | | | | | Make the REG_SEQUENCE be a VGPR, and do the register class copy first. llvm-svn: 251855
* AMDGPU/SI: handle undef for llvm.SI.packf16Marek Olsak2015-10-291-0/+4
| | | | llvm-svn: 251632
* AMDGPU/SI: use S_OR for fneg (fabs f32)Marek Olsak2015-10-291-2/+1
| | | | llvm-svn: 251631
* AMDGPU/SI: use S_AND for i1 truncMarek Olsak2015-10-291-2/+2
| | | | llvm-svn: 251630
* AMDGPU: Print modifiers when dumping AMDGPUOperandMatt Arsenault2015-10-241-1/+1
| | | | llvm-svn: 251160
* AMDGPU: Fix parsing of 32-bit literals with sign bit setMatt Arsenault2015-10-232-5/+8
| | | | llvm-svn: 251132
* AMDGPU: Fix adding redundant m0 usesMatt Arsenault2015-10-211-2/+0
| | | | | | BuildMI already adds these since they are defined correctly now. llvm-svn: 250961
* AMDGPU: Fix verifier error in SIFoldOperandsMatt Arsenault2015-10-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | There may be other use operands that also need their kill flags cleared. This happens in a few tests when SIFoldOperands is moved after PeepholeOptimizer. PeepholeOptimizer rewrites cases that look like: %vreg0 = ... %vreg1 = COPY %vreg0 use %vreg1<kill> %vreg2 = COPY %vreg0 use %vreg2<kill> to use the earlier source to %vreg0 = ... use %vreg0 use %vreg0 Currently SIFoldOperands sees the copied registers, so there is only one use. So far I haven't managed to come up with a test that currently has multiple uses of a foldable VGPR -> VGPR copy. llvm-svn: 250960
* AMDGPU: Split DiagnosticInfoUnsupported into its own fileMatt Arsenault2015-10-214-41/+76
| | | | llvm-svn: 250959
* AMDGPU: Simplify VOP3 operand legalization.Matt Arsenault2015-10-213-42/+58
| | | | | | | | | | | | | | | | | | | | | | | | | This was checking for a variety of situations that should never happen. This saves a tiny bit of compile time. We should not be selecting instructions with invalid operands in the first place. Most of the time for registers copys are inserted to the correct operand register class. For VOP3, since all operand types are supported and literal constants never are, we just need to verify the constant bus requirements (all immediates should be legal inline ones). The only possibly tricky case to maybe worry about is if when legalizing operands in moveToVALU with s_add_i32 and similar instructions. If the original s_add_i32 had a literal constant and we need to replace it with v_add_i32_e64 we would have an unsupported literal operand. However, I don't think we should worry about that because SIFoldOperands should handle folding literal constant operands into the SALU instructions based on the uses. At SIFoldOperands time, the legality and profitability of operand types is a bit different. llvm-svn: 250951
* AMDGPU: Fix not checking implicit operands in verifyInstructionMatt Arsenault2015-10-211-15/+29
| | | | | | | When verifying constant bus restrictions, this wasn't catching uses in implicit operands. llvm-svn: 250948
* AMDGPU: Add MachineInstr overloads for instruction format testsMatt Arsenault2015-10-207-40/+111
| | | | llvm-svn: 250797
* AMDGPU: Stop reserving v[254:255]Matt Arsenault2015-10-201-4/+0
| | | | | | | | | | | This wasn't doing anything useful. They weren't explicitly used anywhere, and the RegScavenger ignores reserved registers. This for some reason caused a random scheduling change in the test. Getting the check lines to pass is too frustrating, and there's probably not too much value in checking the vector case's operands N times. llvm-svn: 250794
* Make a bunch of static arrays const.Craig Topper2015-10-182-2/+2
| | | | llvm-svn: 250642
* Don't pretend AMDGPU backend knows how to custom-lower UDIVREM for vector ↵Artyom Skrobov2015-10-151-1/+1
| | | | | | | | | | | | types; it can't Reviewers: arsenm, jvesely, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13734 llvm-svn: 250384
* AMDGPU: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-139-18/+17
| | | | | | | | | | | | | | | | | | One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new one. Previously, bundle iterators and single-instruction iterators could be compared to each other (comparing on underlying pointers). I changed a comparison from using `MBB->end()` to using `MBB->instr_end()`, since both end iterators should point at the some place anyway. I don't think the implicit conversion between the two iterator types is a good idea since it's fairly easy to accidentally compare to the wrong thing (they aren't always end iterators). Otherwise I would have just added the conversion. Even with that, no there should be functionality change here. llvm-svn: 250218
* AMDGPU: Refactor isVGPRToSGPRCopyMatt Arsenault2015-10-131-19/+48
| | | | | | | It should now correctly handle physical registers and make it easier to identify the other direction. llvm-svn: 250132
* DAGCombiner: Combine extract_vector_elt from build_vectorMatt Arsenault2015-10-122-0/+13
| | | | | | | | | | | | | | This basic combine was surprisingly missing. AMDGPU legalizes many operations in terms of 32-bit vector components, so not doing this results in many extra copies and subregister extracts that need to be cleaned up later. InstCombine already does this for the hasOneUse case. The target hook is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn from a vector materialize repeated immediate instruction to a constant vector load with more scalar copies from it. llvm-svn: 250129
* AMDGPU: Register some more passes so -print-before worksMatt Arsenault2015-10-121-0/+2
| | | | llvm-svn: 250071
* CodeGen: print and verify after TargetPassConfig::insertPass by defaultJustin Bogner2015-10-081-1/+3
| | | | | | | | | | | | | | In r224059, we started verifying after addPass, but missed doing so on insertPass. There isn't a good reason for the discrepancy, and skipping the verifier in these cases causes bugs. This also exposes a verifier error that was introduced in r249087, but the verifier doesn't run until after the register coalescer, when the issue happens to have been resolved. I've skipped the verifier after SIFixSGPRLiveRangesID to avoid the failures for now and will follow up with Matt for a proper fix. llvm-svn: 249643
* AMDGPU: Fix missing implicit m0 uses on movrel instructionsMatt Arsenault2015-10-071-0/+7
| | | | llvm-svn: 249577
* AMDGPU: Add comment for VOP2b operand classMatt Arsenault2015-10-071-0/+5
| | | | | | | | Because of the constant bus requirement, it is never legal to use a literal constant for these instructions despite the encoding allowing it. This was already doing the right thing, but note why. llvm-svn: 249500
* AMDGPU: Properly register passesMatt Arsenault2015-10-071-2/+2
| | | | llvm-svn: 249495
* AMDGPU: Use explicit register size indirect pseudosMatt Arsenault2015-10-073-17/+28
| | | | | | | | | | | | | | | | | This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494
* AMDGPU: Remove inferRegClassFromUses / inferRegClassFromDefsMatt Arsenault2015-10-071-70/+0
| | | | | | | | | I'm not sure why this would be necessary, and no tests fail with them removed. Looking at the uses is suspect as well because the use reg classes will likely change when the users are moved as a result of moving this instruction. llvm-svn: 249493
OpenPOWER on IntegriCloud