summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Reduce the size of MCRelaxableFragment.Akira Hatanaka2015-11-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | MCRelaxableFragment previously kept a copy of MCSubtargetInfo and MCInst to enable re-encoding the MCInst later during relaxation. A copy of MCSubtargetInfo (instead of a reference or pointer) was needed because the feature bits could be modified by the parser. This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment with a constant reference to MCSubtargetInfo. The copies of MCSubtargetInfo are kept in MCContext, and the target parsers are now responsible for asking MCContext to provide a copy whenever the feature bits of MCSubtargetInfo have to be toggled. With this patch, I saw a 4% reduction in peak memory usage when I compiled verify-uselistorder.lto.bc using llc. rdar://problem/21736951 Differential Revision: http://reviews.llvm.org/D14346 llvm-svn: 253127
* [MCTargetAsmParser] Move the member varialbes that referenceAkira Hatanaka2015-11-141-9/+7
| | | | | | | | | | MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a member function getSTI. This is done in preparation for making changes to shrink the size of MCRelaxableFragment. (see http://reviews.llvm.org/D14346). llvm-svn: 253124
* AMDGPU: Add stony supportTom Stellard2015-11-131-0/+4
| | | | | | Patch by: Alex Deucher llvm-svn: 253053
* Revert "Remove unnecessary call to getAllocatableRegClass"Tom Stellard2015-11-123-6/+16
| | | | | | | | | | | | | This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956
* AMDGPU: Print more fields in commentsMatt Arsenault2015-11-111-3/+14
| | | | llvm-svn: 252677
* AMDGPU: Remove dead codeMatt Arsenault2015-11-111-33/+2
| | | | llvm-svn: 252675
* AMDGPU: Set isAllocatable = 0 on VS_32/VS_64Matt Arsenault2015-11-113-16/+6
| | | | llvm-svn: 252674
* AMDGPU/SI: Refactor VOP[12C] tablegen definitionsTom Stellard2015-11-062-97/+75
| | | | | | | | | | | | | | Summary: Pass the VOPProfile object all the through to *_m multiclasses. This will allow us to do more simplifications in the future. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13437 llvm-svn: 252339
* AMDGPU: Cleanup includesMatt Arsenault2015-11-062-6/+4
| | | | llvm-svn: 252328
* AMDGPU: Create emergency stack slots during frame loweringMatt Arsenault2015-11-067-14/+89
| | | | | | Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
* AMDGPU: Remove unused scratch resource operandsMatt Arsenault2015-11-062-75/+131
| | | | | | The SGPR spill pseudos don't actually use them. llvm-svn: 252324
* AMDGPU: Add pass to detect used kernel featuresMatt Arsenault2015-11-064-0/+138
| | | | | | | | | | | Mark kernels that use certain features that require user SGPRs to support with kernel attributes. We need to know before instruction selection begins because it impacts the kernel calling convention lowering. For now this only detects the workitem intrinsics. llvm-svn: 252323
* AMDGPU: Fix hardcoded alignment of spill.Matt Arsenault2015-11-062-13/+12
| | | | | | | Instead of forcing 4 alignment when spilled, set register class alignments. llvm-svn: 252322
* AMDGPU: Hack for VS_32 register pressureMatt Arsenault2015-11-062-4/+17
| | | | | | | | | | | | | For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
* AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNELTom Stellard2015-11-066-0/+60
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291
* AMDGPU: Also track whether SGPRs were spilledMatt Arsenault2015-11-053-2/+20
| | | | llvm-svn: 252145
* AMDGPU: Print number user SGPRsMatt Arsenault2015-11-051-0/+6
| | | | | | | This doesn't quite match how SC prints it, which doesn't put it in a comment. llvm-svn: 252144
* AMDGPU: Disallow s[102:103] on VI in assemblerMatt Arsenault2015-11-051-2/+28
| | | | llvm-svn: 252142
* AMDGPU: Fix assert when legalizing atomic operandsMatt Arsenault2015-11-053-15/+59
| | | | | | | | | | The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140
* AMDGPU: Make addr64 atomic operand order consistentMatt Arsenault2015-11-051-2/+2
| | | | | | | vaddr comes before srsrc in every other MUBUF instruction, and is the order it is printed. llvm-svn: 252139
* AMDGPU: Fix typoMatt Arsenault2015-11-051-2/+2
| | | | llvm-svn: 252116
* AMDGPU: Make flat_scratch name consistentMatt Arsenault2015-11-031-3/+3
| | | | | | | The printed name and the parsed assembler names weren't the same. I'm not sure which name SC prints these as, but I think it's this one. llvm-svn: 252010
* AMDGPU: Fix asserts on invalid register rangesMatt Arsenault2015-11-031-5/+13
| | | | | | | | | If the requested SGPR was not actually aligned, it was accepted and rounded down instead of rejected. Also fix an assert if the range is an invalid size. llvm-svn: 252009
* AMDGPU: Fix off by one error in register parsingMatt Arsenault2015-11-031-4/+5
| | | | | | If trying to use one past the end, this would assert. llvm-svn: 252008
* AMDGPU: s[102:103] is unavailable on VIMatt Arsenault2015-11-031-1/+10
| | | | llvm-svn: 252000
* AMDGPU: Define correct number of SGPRsMatt Arsenault2015-11-032-6/+10
| | | | | | | | | There are actually 104 so 2 were missing. More assembler tests with high register number tuples will be included in later patches. llvm-svn: 251999
* AMDGPU: Make findUsedSGPR more readableMatt Arsenault2015-11-031-7/+18
| | | | | | Add more comments etc. llvm-svn: 251996
* AMDGPU: Initialize SIFixSGPRCopies so -print-after worksMatt Arsenault2015-11-033-8/+15
| | | | llvm-svn: 251995
* AMDGPU: Alphabetize includesMatt Arsenault2015-11-031-1/+1
| | | | llvm-svn: 251994
* ScheduleDAGInstrs: Remove IsPostRA flag; NFCMatthias Braun2015-11-031-2/+1
| | | | | | | | | | | | | | | | | ScheduleDAGInstrs doesn't behave differently before or after register allocation. It was only used in a method of MachineSchedulerBase which behaved differently in MachineScheduler/PostMachineScheduler. Change this to let MachineScheduler/PostMachineScheduler just pass in a parameter to that function. The order of the LiveIntervals* and bool RemoveKillFlags paramters have been switched to make out-of-tree code fail instead of unintentionally passing a value intended for the IsPostRA flag to the (previously following and default initialized) RemoveKillFlags. Differential Revision: http://reviews.llvm.org/D14245 llvm-svn: 251883
* AMDGPU: Stop assuming vreg for build_vectorMatt Arsenault2015-11-022-20/+40
| | | | | | | | | | | | | This was causing a variety of test failures when v2i64 is added as a legal type. SIFixSGPRCopies should correctly handle the case of vector inputs to a scalar reg_sequence, so this isn't necessary anymore. This was hiding some deficiencies in how reg_sequence is handled later, but this shouldn't be a problem anymore since the register class copy of a reg_sequence is now done before the reg_sequence. llvm-svn: 251860
* AMDGPU: Error on graphics shaders with HSAMatt Arsenault2015-11-021-0/+8
| | | | | | | | I've found myself pointlessly debugging problems from running graphics tests with an HSA triple a few times, so stop this from happening again. llvm-svn: 251858
* AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCEMatt Arsenault2015-11-021-23/+89
| | | | | | | Make the REG_SEQUENCE be a VGPR, and do the register class copy first. llvm-svn: 251855
* AMDGPU/SI: handle undef for llvm.SI.packf16Marek Olsak2015-10-291-0/+4
| | | | llvm-svn: 251632
* AMDGPU/SI: use S_OR for fneg (fabs f32)Marek Olsak2015-10-291-2/+1
| | | | llvm-svn: 251631
* AMDGPU/SI: use S_AND for i1 truncMarek Olsak2015-10-291-2/+2
| | | | llvm-svn: 251630
* AMDGPU: Print modifiers when dumping AMDGPUOperandMatt Arsenault2015-10-241-1/+1
| | | | llvm-svn: 251160
* AMDGPU: Fix parsing of 32-bit literals with sign bit setMatt Arsenault2015-10-232-5/+8
| | | | llvm-svn: 251132
* AMDGPU: Fix adding redundant m0 usesMatt Arsenault2015-10-211-2/+0
| | | | | | BuildMI already adds these since they are defined correctly now. llvm-svn: 250961
* AMDGPU: Fix verifier error in SIFoldOperandsMatt Arsenault2015-10-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | There may be other use operands that also need their kill flags cleared. This happens in a few tests when SIFoldOperands is moved after PeepholeOptimizer. PeepholeOptimizer rewrites cases that look like: %vreg0 = ... %vreg1 = COPY %vreg0 use %vreg1<kill> %vreg2 = COPY %vreg0 use %vreg2<kill> to use the earlier source to %vreg0 = ... use %vreg0 use %vreg0 Currently SIFoldOperands sees the copied registers, so there is only one use. So far I haven't managed to come up with a test that currently has multiple uses of a foldable VGPR -> VGPR copy. llvm-svn: 250960
* AMDGPU: Split DiagnosticInfoUnsupported into its own fileMatt Arsenault2015-10-214-41/+76
| | | | llvm-svn: 250959
* AMDGPU: Simplify VOP3 operand legalization.Matt Arsenault2015-10-213-42/+58
| | | | | | | | | | | | | | | | | | | | | | | | | This was checking for a variety of situations that should never happen. This saves a tiny bit of compile time. We should not be selecting instructions with invalid operands in the first place. Most of the time for registers copys are inserted to the correct operand register class. For VOP3, since all operand types are supported and literal constants never are, we just need to verify the constant bus requirements (all immediates should be legal inline ones). The only possibly tricky case to maybe worry about is if when legalizing operands in moveToVALU with s_add_i32 and similar instructions. If the original s_add_i32 had a literal constant and we need to replace it with v_add_i32_e64 we would have an unsupported literal operand. However, I don't think we should worry about that because SIFoldOperands should handle folding literal constant operands into the SALU instructions based on the uses. At SIFoldOperands time, the legality and profitability of operand types is a bit different. llvm-svn: 250951
* AMDGPU: Fix not checking implicit operands in verifyInstructionMatt Arsenault2015-10-211-15/+29
| | | | | | | When verifying constant bus restrictions, this wasn't catching uses in implicit operands. llvm-svn: 250948
* AMDGPU: Add MachineInstr overloads for instruction format testsMatt Arsenault2015-10-207-40/+111
| | | | llvm-svn: 250797
* AMDGPU: Stop reserving v[254:255]Matt Arsenault2015-10-201-4/+0
| | | | | | | | | | | This wasn't doing anything useful. They weren't explicitly used anywhere, and the RegScavenger ignores reserved registers. This for some reason caused a random scheduling change in the test. Getting the check lines to pass is too frustrating, and there's probably not too much value in checking the vector case's operands N times. llvm-svn: 250794
* Make a bunch of static arrays const.Craig Topper2015-10-182-2/+2
| | | | llvm-svn: 250642
* Don't pretend AMDGPU backend knows how to custom-lower UDIVREM for vector ↵Artyom Skrobov2015-10-151-1/+1
| | | | | | | | | | | | types; it can't Reviewers: arsenm, jvesely, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13734 llvm-svn: 250384
* AMDGPU: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-139-18/+17
| | | | | | | | | | | | | | | | | | One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new one. Previously, bundle iterators and single-instruction iterators could be compared to each other (comparing on underlying pointers). I changed a comparison from using `MBB->end()` to using `MBB->instr_end()`, since both end iterators should point at the some place anyway. I don't think the implicit conversion between the two iterator types is a good idea since it's fairly easy to accidentally compare to the wrong thing (they aren't always end iterators). Otherwise I would have just added the conversion. Even with that, no there should be functionality change here. llvm-svn: 250218
* AMDGPU: Refactor isVGPRToSGPRCopyMatt Arsenault2015-10-131-19/+48
| | | | | | | It should now correctly handle physical registers and make it easier to identify the other direction. llvm-svn: 250132
* DAGCombiner: Combine extract_vector_elt from build_vectorMatt Arsenault2015-10-122-0/+13
| | | | | | | | | | | | | | This basic combine was surprisingly missing. AMDGPU legalizes many operations in terms of 32-bit vector components, so not doing this results in many extra copies and subregister extracts that need to be cleaned up later. InstCombine already does this for the hasOneUse case. The target hook is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn from a vector materialize repeated immediate instruction to a constant vector load with more scalar copies from it. llvm-svn: 250129
OpenPOWER on IntegriCloud