summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix bug 26659.Matt Arsenault2016-03-021-1/+1
| | | | | | | | Fix checking the same instruction twice instead of the second branch that uses vccz. I don't think this matters currently because s_branch_vccnz is always used currently. llvm-svn: 262457
* AMDGPU: Cleanup suggested in bug 23960Matt Arsenault2016-03-021-6/+3
| | | | llvm-svn: 262456
* Bug 20810: Use report_fatal_error instead of unreachableMatt Arsenault2016-03-021-6/+6
| | | | llvm-svn: 262455
* TableGen: Check scheduling models for completenessMatthias Braun2016-03-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | TableGen checks at compiletime that for scheduling models with "CompleteModel = 1" one of the following holds: - Is marked with the hasNoSchedulingInfo flag - The instruction is a subclass of Sched - There are InstRW definitions in the scheduling model Typical steps necessary to complete a model: - Ensure all pseudo instructions that are expanded before machine scheduling (usually everything handled with EmitYYY() functions in XXXTargetLowering). - If a CPU does not support some instructions mark the corresponding resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }". - Add missing scheduling information. Differential Revision: http://reviews.llvm.org/D17747 llvm-svn: 262384
* AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and ↵Changpeng Fang2016-03-013-0/+32
| | | | | | | | | | | | | | | | Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356
* [AMDGPU] Remove unused disassembler code.Nikolay Haustov2016-03-011-2/+0
| | | | llvm-svn: 262346
* [AMDGPU] Fix build warnings.Nikolay Haustov2016-03-011-2/+2
| | | | llvm-svn: 262338
* [AMDGPU] Disassembler code refactored + error messages.Nikolay Haustov2016-03-013-385/+308
| | | | | | | | | | | | | | | | | | Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out. 64bit instructions decoding added. Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code. The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel. Previous TODOs were saved whenever possible. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17720 llvm-svn: 262332
* [TableGen] AsmMatcher: Skip optional operands in the midle of instruction if ↵Nikolay Haustov2016-03-012-24/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | it is not present Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented. For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString: string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod"; Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod). Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal. Patch by: Sam Kolton Review: http://reviews.llvm.org/D17568 [AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods With this change you should place optional operands in order specified by asm string: clamp -> omod offset -> glc -> slc -> tfe Fixes for several tests. Depends on D17568 Patch by: Sam Kolton Review: http://reviews.llvm.org/D17644 llvm-svn: 262314
* AMDGPU: Don't emit build_pair during udivrem legalizationMatt Arsenault2016-03-011-6/+11
| | | | | | | | Technically you aren't supposed to emit these after type legalization for some reason, and we use vector extracts of bitcasted integers as the canonical way to do this. llvm-svn: 262298
* AMDGPU: Don't use estimated stack size when we know the real stack sizeMatt Arsenault2016-03-011-1/+1
| | | | llvm-svn: 262297
* AMDGPU: Set HasExtractBitInsnMatt Arsenault2016-03-011-0/+11
| | | | | | | | | | This currently does not have the control over the bitwidth, and there are missing optimizations to reduce the integer to 32-bit if it can be. But in most situations we do want the sinking to occur. llvm-svn: 262296
* AMDGPU: More bits of frame index are known to be zeroMatt Arsenault2016-02-274-29/+26
| | | | | | | | | | | | The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. llvm-svn: 262153
* CodeGen: Update LiveIntervalAnalysis API to use MachineInstr&, NFCDuncan P. N. Exon Smith2016-02-271-2/+2
| | | | | | These parameters aren't expected to be null, so take them by reference. llvm-svn: 262151
* CodeGen: Update DFAPacketizer API to take MachineInstr&, NFCDuncan P. N. Exon Smith2016-02-271-35/+38
| | | | | | | | | In all but one case, change the DFAPacketizer API to take MachineInstr& instead of MachineInstr*. In DFAPacketizer::endPacket(), take MachineBasicBlock::iterator. Besides cleaning up the API, this is in search of PR26753. llvm-svn: 262142
* AMDGPU: Split vi-insts subtarget featureMatt Arsenault2016-02-273-6/+24
| | | | | | | This will be more useful for marking builtins acceptable for which subtargets. llvm-svn: 262121
* AMDGPU: Add s_sleep intrinsicMatt Arsenault2016-02-272-1/+17
| | | | llvm-svn: 262120
* AMDGPU: Implement readcyclecounterMatt Arsenault2016-02-277-10/+68
| | | | | | | | | | This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. llvm-svn: 262119
* CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFCDuncan P. N. Exon Smith2016-02-272-13/+13
| | | | | | | | | | | | | | Take MachineInstr by reference instead of by pointer in SlotIndexes and the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are never null, so this cleans up the API a bit. It also incidentally removes a few implicit conversions from MachineInstrBundleIterator to MachineInstr* (see PR26753). At a couple of call sites it was convenient to convert to a range-based for loop over MachineBasicBlock::instr_begin/instr_end, so I added MachineBasicBlock::instrs. llvm-svn: 262115
* [AMDGPU] Assembler: Basic support for MIMGNikolay Haustov2016-02-266-59/+202
| | | | | | | | | | | Add parsing and printing of image operands. Matches legacy sp3 assembler. Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last. Update SITargetLowering for new order. Add basic MC test. Update CodeGen tests. Review: http://reviews.llvm.org/D17574 llvm-svn: 261995
* [AMDGPU] Disassembler: Support for all VOP1 instructions.Nikolay Haustov2016-02-253-62/+242
| | | | | | | | | | | | | | | Support all instructions with VOP1 encoding with 32 or 64-bit operands for VI subtarget: VGPR_32 and VReg_64 operand register classes VS_32 and VS_64 operand register classes with inline and literal constants Tests for VOP1 instructions. Patch by: skolton Reviewers: arsenm, tstellarAMD Review: http://reviews.llvm.org/D17194 llvm-svn: 261878
* [AMDGPU] Assembler: Simplify handling of optional operandsNikolay Haustov2016-02-253-75/+77
| | | | | | | | | | | | | | | | | | | | | | Resubmit with index problem fixed. Verified with valgrind. Prepare to support DPP encodings. For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead. Mark more operands as IsOptional = 1 in .td files. Do not add default values for optional operands to OperandVector in AMDGPUAsmParser. Add default values for optional operands during conversion using new helper addOptionalImmOperand. Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one. Separate cvtFlat and cvtFlatAtomic. Fix CNDMASK_B32 definition to have no modifiers. Review: http://reviews.llvm.org/D17445 llvm-svn: 261856
* Revert r261742, "[AMDGPU] Assembler: Simplify handling of optional operands"NAKAMURA Takumi2016-02-253-79/+75
| | | | | | It brought undefined behavior. llvm-svn: 261839
* [AMDGPU] Assembler: Simplify handling of optional operandsNikolay Haustov2016-02-243-75/+79
| | | | | | | | | | | | | | | | | | Prepare to support DPP encodings. For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead. Mark more operands as IsOptional = 1 in .td files. Do not add default values for optional operands to OperandVector in AMDGPUAsmParser. Add default values for optional operands during conversion using new helper addOptionalImmOperand. Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one. Separate cvtFlat and cvtFlatAtomic. Fix CNDMASK_B32 definition to have no modifiers. Review: http://reviews.llvm.org/D17445 Reviewers: tstellarAMD llvm-svn: 261742
* [AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing ↵Nikolay Haustov2016-02-241-7/+15
| | | | | | | | | | | reserved fields) lit tests passed before and after because it doesn't test the binary representation of amd_kernel_code_t. Patch by: Valery Pykhtin (Valery.Pykhtin@amd.com) Reviewers: arsenm llvm-svn: 261732
* AMDGPU: Check cheaper condition before SignBitIsZeroMatt Arsenault2016-02-241-7/+6
| | | | | | | Don't do an expensive computeKnownBits call when we can do the cheap check for legal offsets first. llvm-svn: 261720
* [AMDGPU] Fix operands of S_BFE_U64 and S_BFM_B64Nikolay Haustov2016-02-232-2/+7
| | | | | | | | | | | src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64). src0 and src1 of s_bfm_b64 are 32-bit. Update tests. Review: http://reviews.llvm.org/D17480 Reviewers: arsenm llvm-svn: 261621
* CodeGen: TII: Take MachineInstr& in predicate API, NFCDuncan P. N. Exon Smith2016-02-233-38/+33
| | | | | | | | | | | | | Change TargetInstrInfo API to take `MachineInstr&` instead of `MachineInstr*` in the functions related to predicated instructions (I'll try to come back later and get some of the rest). All of these functions require non-null parameters already, so references are more clear. As a bonus, this happens to factor away a host of implicit iterator => pointer conversions. No functionality change intended. llvm-svn: 261605
* CodeGen: Bring back MachineBasicBlock::iterator::getInstrIterator()...Duncan P. N. Exon Smith2016-02-222-2/+2
| | | | | | | | | | | | | | | | | | This is a little embarrassing. When I reverted r261504 (getIterator() => getInstrIterator()) in r261567, I did a `git grep` to see if there were new calls to `getInstrIterator()` that I needed to migrate. There were 10-20 hits, and I blindly did a `sed ...` before calling `ninja check`. However, these were `MachineInstrBundleIterator::getInstrIterator()`, which predated r261567. Perhaps coincidentally, these had an identical name and return type. This commit undoes my careless sed and restores `MachineBasicBlock::iterator::getInstrIterator()`. llvm-svn: 261577
* AMDGPU/R600: Implement allowsMisalignedMemoryAccessMatt Arsenault2016-02-222-0/+24
| | | | | | | | This avoids some test regressions in a future commit when unaligned operations are expanded when they have custom lowering. llvm-svn: 261570
* Revert "CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC"Duncan P. N. Exon Smith2016-02-223-3/+3
| | | | | | | | | | This reverts commit r261504, since it's not obvious the new name is better: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160222/334298.html I'll recommit if we get consensus that it's the right direction. llvm-svn: 261567
* [AMDGPU][llvm-mc] Support for 32-bit inline literalsTom Stellard2016-02-222-34/+59
| | | | | | | | | | | | | | | | | | | | | | | Patch by: Artem Tamazov Summary: Note: Support for 64-bit inline literals TBD Added: Support of abs/neg modifiers for literals (incomplete; parsing TBD). Added: Some TODO comments. Reworked/clarity: rename isInlineImm() to isInlinableImm() Reworked/robustness: disallow BitsToFloat() with undefined value in isInlinableImm() Reworked/reuse: isSSrc32/64(), isVSrc32/64() Tests added. Reviewers: tstellarAMD, arsenm Subscribers: vpykhtin, nhaustov, SamWot, arsenm Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D17204 llvm-svn: 261559
* [AMDGPU] [llvm-mc] [VI] Fix encoding of LDS/GDS instructions.Tom Stellard2016-02-221-1/+3
| | | | | | | | | | | | | | | | Patch by: Artem Tamazov Summary: Tests added. Reviewers: tstellarAMD, arsenm Subscribers: vpykhtin, SamWot, #llvm-amdgpu-spb Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D17271 llvm-svn: 261558
* CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFCDuncan P. N. Exon Smith2016-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Delete MachineInstr::getIterator(), since the term "iterator" is overloaded when talking about MachineInstr. - Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so that ilist_node::getIterator() is still available. - Add it back as MachineInstr::getInstrIterator(). This matches the naming in MachineBasicBlock. - Add MachineInstr::getBundleIterator(). This is explicitly called "bundle" (not matching MachineBasicBlock) to disintinguish it clearly from ilist_node::getIterator(). - Update all calls. Some of these I switched to `auto` to remove boiler-plate, since the new name is clear about the type. There was one call I updated that looked fishy, but it wasn't clear what the right answer was. This was in X86FrameLowering::inlineStackProbe(), added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to leave the behaviour unchanged, but I'll reply to the original commit on the list in a moment. llvm-svn: 261504
* AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointerTom Stellard2016-02-202-238/+22
| | | | | | | | | | | | | | | | | | | | | | Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385
* AMDGPU/SI: Fix s_waitcnt insertion for flat instructionsTom Stellard2016-02-191-2/+4
| | | | | | | | | | | | | | | | Summary: This was broken in r260694 which swapped the address and data operands for flat store instructions. The code in SIInsertWaits assumes that the data operand always comes before the address operand, so we need to add a special case for flat. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17366 llvm-svn: 261330
* AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsicsNicolai Haehnle2016-02-183-30/+75
| | | | | | | | | | | | | Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224
* Test commit access.Nikolay Haustov2016-02-181-1/+0
| | | | llvm-svn: 261199
* [AMDGPU] Disassembler: Added basic disassembler for AMDGPU targetTom Stellard2016-02-1812-49/+551
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - Added disassembler project - Fixed all decoding conflicts in .td files - Added DecoderMethod=“NONE” option to Target.td that allows to disable decoder generation for an instruction. - Created decoding functions for VS_32 and VReg_32 register classes. - Added stubs for decoding all register classes. - Added several tests for disassembler Disassembler only supports: - VI subtarget - VOP1 instruction encoding - 32-bit register operands and inline constants [Valery] One of the point that requires to pay attention to is how decoder conflicts were resolved: - Groups of target instructions were separated by using different DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate approach. - There were conflicts in IMAGE_<> instructions caused by two different reasons: 1. dmask wasn’t specified for the output (fixed) 2. There are image instructions that differ only by the number of the address components but have the same encoding by the HW spec. The actual number of address components is determined by the HW at runtime using image resource descriptor starting from the VGPR encoded in an IMAGE instruction. This means that we should choose only one instruction from conflicting group to be the rule for decoder. I didn’t find the way to disable decoder generation for an arbitrary instruction and therefore made a onelinear fix to tablegen generator that would suppress decoder generation when DecoderMethod is set to “NONE”. This is a change that should be reviewed and submitted first. Otherwise I would need to specify different DecoderNamespace for every instruction in the conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not used in other targets. 3. IMAGE_GATHER decoder generation is for now disabled and to be done later. [/Valery] Patch By: Sam Kolton Differential Revision: http://reviews.llvm.org/D16723 llvm-svn: 261185
* [AMDGPU] Rename $dst operand to $vdst for VOP instructions.Tom Stellard2016-02-166-77/+120
| | | | | | | | | | | | | | Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986
* AMDGPU: Prepare for reducing private element size.Matt Arsenault2016-02-131-14/+48
| | | | | | | | | | | | Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804
* AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsicTom Stellard2016-02-131-0/+11
| | | | | | | This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792
* AMDGPU: Cleanup includes and random macrosMatt Arsenault2016-02-131-11/+4
| | | | llvm-svn: 260784
* AMDGPU: Add intrinsics for sin/cosMatt Arsenault2016-02-132-1/+18
| | | | | | | These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782
* AMDGPU: Rename intrinsic to better match instruction nameMatt Arsenault2016-02-137-9/+9
| | | | | | Also fixes missing f32 test. llvm-svn: 260780
* AMDGPU/SI: Add instruction defs for VOP1 DPP instructionsTom Stellard2016-02-132-0/+107
| | | | | | | | | | Reviewers: nhaustov, cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17159 llvm-svn: 260774
* AMDGPU: Fix broken condition causing warningMatt Arsenault2016-02-131-1/+1
| | | | llvm-svn: 260773
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-1215-41/+266
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD ↵Tom Stellard2016-02-122-3/+3
| | | | | | | | | | | | | | assembler Historically, AMD internal sp3 assembler has flat_store* addr, data format. To match existing code and to enable reuse, change LLVM definitions to match. Also update MC and CodeGen tests. Differential Revision: http://reviews.llvm.org/D16927 Patch by: Nikolay Haustov llvm-svn: 260694
* AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass.Changpeng Fang2016-02-121-4/+10
| | | | | | | | | | | | | | | Summary: It is possible that the loop condition can be a boolean constant (infinite loop, for example). So we sould handle constant condition in annotating a loop. This patch adds this functionality to support annotating constant condition. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D15093 llvm-svn: 260692
OpenPOWER on IntegriCloud