summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and ↵Daniel Sanders2015-09-156-21/+21
| | | | | | | | related. NFC. Eric has replied and has demanded the patch be reverted. llvm-svn: 247702
* Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* ↵Daniel Sanders2015-09-156-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Thanks go to Pavel Labath for fixing LLDB for me. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247692
* Revert r247684 - Replace Triple with a new TargetTuple ...Daniel Sanders2015-09-156-21/+21
| | | | | | LLDB needs to be updated in the same commit. llvm-svn: 247686
* Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.Daniel Sanders2015-09-156-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247683
* Fix typos.Bruce Mitchener2015-09-121-1/+1
| | | | | | | | | | Summary: This fixes a variety of typos in docs, code and headers. Subscribers: jholewinski, sanjoy, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12626 llvm-svn: 247495
* Pass BranchProbability/BlockMass by value instead of const& as they are ↵Cong Hou2015-09-102-6/+6
| | | | | | small. NFC. llvm-svn: 247357
* AMDGPU: Simplify debug printingMatt Arsenault2015-09-103-13/+8
| | | | llvm-svn: 247345
* AMDGPU: Use StringRef valueMatt Arsenault2015-09-101-1/+1
| | | | llvm-svn: 247344
* AMDGPU/SI: Fix more cases of losing exec operandsMatt Arsenault2015-09-103-16/+12
| | | | llvm-svn: 247230
* AMDGPU/SI: Fix creating v_mov_b32s without exec usesMatt Arsenault2015-09-101-2/+14
| | | | | | | This will be caught by existing tests with a verifier check to be added in a future commit. llvm-svn: 247229
* AMDGPU: Extract full 64-bit subregister and use subregsMatt Arsenault2015-09-091-35/+29
| | | | | | | | | | | | Instead of extracting both 32-bit components from the 128-bit register. This produces fewer copies and is easier for the copy peephole optimizer to understand and see the actual uses as extracts from a reg_sequence. This avoids needing to handle subregister composing in the PeepholeOptimizer's ValueTracker for this case. llvm-svn: 247162
* AMDGPU: Remove unused multiclass argumentMatt Arsenault2015-09-091-5/+4
| | | | llvm-svn: 247161
* AMDGPU/SI: Fold operands through REG_SEQUENCE instructionsTom Stellard2015-09-091-0/+21
| | | | | | | | | | | | | | Summary: This helps mostly when we use add instructions for address calculations that contain immediates. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12256 llvm-svn: 247157
* AMDGPU: Fix not encoding src2 of VOP3b instructionsMatt Arsenault2015-09-091-4/+4
| | | | | | | Broken by r247074. Should include an assembler test, but the assembler is currently broken for VOP3b apparently. llvm-svn: 247123
* SelectionDAG: Support Expand of f16 extloadsMatt Arsenault2015-09-091-29/+3
| | | | | | | | | | Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113
* AMDGPU/SI: Fix input vcc operand for VOP2b instructionsMatt Arsenault2015-09-084-41/+57
| | | | | | | | | Adds vcc to output string input for e32. Allows option of using e64 encoding with assembler. Also fixes these instructions not implicitly reading exec. llvm-svn: 247074
* AMDGPU: Mark s_barrier as a high latency instructionMatt Arsenault2015-09-082-0/+3
| | | | | | | | | | | These were marked as WriteSALU, which is low latency. I'm guessing at the value to use, but it should probably be considered the highest latency instruction. I'm not sure this has any actual effect since hasSideEffects probably is preventing any moving of these. llvm-svn: 247060
* AMDGPU: Fix s_barrier flagsMatt Arsenault2015-09-081-2/+1
| | | | | | | | This should be convergent. This is not a barrier in the isBarrier sense, nor hasCtrlDep. llvm-svn: 247059
* AMDGPU: Handle sub of constant for DS offset foldingMatt Arsenault2015-09-081-11/+62
| | | | | | | | | sub C, x - > add (sub 0, x), C for DS offsets. This is mostly to fix regressions that show up when SeparateConstOffsetFromGEP is enabled. llvm-svn: 247054
* check for fastness before merging in DAGCombiner::MergeConsecutiveStores() Sanjay Patel2015-09-031-1/+4
| | | | | | | | | | | | | | | | Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771
* AMDGPU: Fix adding redundant implicit operandsMatt Arsenault2015-09-011-11/+7
| | | | | | | These are already added during the MachineInstr construction, so this was adding the implicit registers twice. llvm-svn: 246525
* AMDGPU: Add sdst operand to VOP2b instructionsMatt Arsenault2015-08-292-20/+30
| | | | | | | | | | The VOP3 encoding of these allows any SGPR pair for the i1 output, but this was forced before to always use vcc. This doesn't yet try to use this, but does add the operand to the definitions so the main change is adding vcc to the output of the VOP2 encoding. llvm-svn: 246358
* AMDGPU: Set mem operands for spill instructionsMatt Arsenault2015-08-293-25/+55
| | | | llvm-svn: 246357
* AMDGPU: Fix dropping mem operands when moving to VALUMatt Arsenault2015-08-291-11/+12
| | | | | | | | | | | | | Without a memory operand, mayLoad or mayStore instructions are treated as hasUnorderedMemRef, which results in much worse scheduling. We really should have a verifier check that any non-side effecting mayLoad or mayStore has a memory operand. There are a few instructions (interp and images) which I'm not sure what / where to add these. llvm-svn: 246356
* AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediatesTom Stellard2015-08-291-1/+5
| | | | | | | | | | | | | | | Summary: We were assuming tha if the use operand had a sub-register that the immediate was 64-bits, but this was breaking the case of folding a 64-bit immediate into another 64-bit instruction. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12255 llvm-svn: 246354
* AMDGPU/SI: Factor operand folding code into its own functionTom Stellard2015-08-281-67/+79
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12254 llvm-svn: 246353
* AMDGPU: Delete dead codeMatt Arsenault2015-08-263-68/+4
| | | | | | | | | | | | | | | | | There is no context where s_mov_b64 is emitted and could potentially be moved to the VALU. It is currently only emitted for materializing immediates, which can't be dependent on vector sources. The immediate splitting is already done when selecting constants. I'm not sure what contexts if any the register splitting would have been used before. Also clean up using s_mov_b64 in place of v_mov_b64_pseudo, although this isn't required and just skips the extra step of eliminating the copy from the SReg_64. llvm-svn: 246080
* AMDGPU: Don't reprocess instructions when splitting i64 bcntMatt Arsenault2015-08-261-4/+5
| | | | llvm-svn: 246079
* AMDGPU: Fix not moving users of s_bfe_i64 to VALUMatt Arsenault2015-08-261-0/+2
| | | | | | | This wouldn't propagate to users of the original BFE and would hit a verifier error. llvm-svn: 246078
* AMDGPU: Don't create intermediate SALU instructionsMatt Arsenault2015-08-262-27/+44
| | | | | | | | | | | | When splitting 64-bit operations, create the correct VALU instructions immediately. This was splitting things like s_or_b64 into the two s_or_b32s and then pushing the new instructions onto the worklist. There's no reason we need to do this intermediate step. llvm-svn: 246077
* AMDGPU/SI: Report SIFixSGPRLiveRanges changed functionMatt Arsenault2015-08-261-1/+4
| | | | llvm-svn: 246056
* AMDGPU: Make sure to reserve super registersMatt Arsenault2015-08-262-16/+18
| | | | | | | | I think this could potentially have broken if one of the super registers were allocated that contain v254/v255. llvm-svn: 246051
* AMDGPU: Produce error on dynamic_stackallocMatt Arsenault2015-08-263-0/+19
| | | | llvm-svn: 246048
* AMDGPU: Allow specifying different opcode on VI for SMRD/SMEMMatt Arsenault2015-08-222-15/+21
| | | | | | | | Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775
* AMDGPU: Improve accuracy of instruction rates for some FP instructionsMatt Arsenault2015-08-222-7/+27
| | | | llvm-svn: 245774
* AMDGPU: Use DFS to avoid second loop over functionMatt Arsenault2015-08-221-15/+13
| | | | llvm-svn: 245772
* AMDGPU: Make sure to run verifier after SIFixSGPRLiveRangesMatt Arsenault2015-08-221-1/+1
| | | | llvm-svn: 245769
* AMDGPU: Improve debug printing in SIFixSGPRLiveRangesMatt Arsenault2015-08-221-6/+15
| | | | llvm-svn: 245768
* AMDGPU: Move CI instructions into CIInstructions.tdMatt Arsenault2015-08-222-70/+69
| | | | | | There are still a couple of CI patterns left in SIInstructions. llvm-svn: 245767
* AMDGPU: Minor cleanups to help with f16 supportMatt Arsenault2015-08-211-9/+11
| | | | | | | | The main change is inverting the condition for the operand class classes so that VT.Size == 16 uses VGPR_32 instead of 64. llvm-svn: 245764
* AMDGPU/SI: Better handle s_wait insertionTom Stellard2015-08-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
* [TLI] Refactor "is integer division cheap" queries.Michael Kuperstein2015-08-191-4/+0
| | | | | | | | | | | | | This removes the isPow2SDivCheap() query, as it is not currently used in any meaningful way. isIntDivCheap() no longer relies on a state variable (as all in-tree target set it to false), but the interface allows querying based on the type optimization level. NFC. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245430
* MachineRegisterInfo: Introduce isPhysRegUsed()Matthias Braun2015-08-181-6/+3
| | | | | | | | | | | | | | | | This method checks whether a physical regiser or any of its aliases are used in the function. Using this function in SIRegisterInfo::findUnusedReg() should also fix this reported failure: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html http://reviews.llvm.org/rL242173#inline-533 The report doesn't come with a testcase and I don't know enough about AMDGPU to create one myself. llvm-svn: 245329
* Add missing include guard.Yaron Keren2015-08-161-0/+4
| | | | llvm-svn: 245173
* AMDGPU/SI: Only look at live out SGPR defsMatt Arsenault2015-08-151-3/+7
| | | | | | | | | | | | | | | | | When trying to fix SGPR live ranges, skip defs that are killed in the same block as the def. I don't think we need to worry about these cases as long as the live ranges of the SGPRs in dominating blocks are correct. This reduces the number of elements the second loop over the function needs to look at, and makes it generally easier to understand. The second loop also only considers if the live range is live in to a block, which logically means it must have been live out from another. llvm-svn: 245150
* Remove redundant TargetFrameLowering::getFrameIndexOffset virtualJames Y Knight2015-08-154-5/+16
| | | | | | | | | | | function. This was the same as getFrameIndexReference, but without the FrameReg output. Differential Revision: http://reviews.llvm.org/D12042 llvm-svn: 245148
* AMDGPU/SI: Fix printing useless info with amdhsaMatt Arsenault2015-08-151-1/+1
| | | | | | | The comments at the bottom would all report 0 if amdhsa was used. llvm-svn: 245135
* AMDGPU/SI: Update LiveVariablesMatt Arsenault2015-08-151-2/+15
| | | | | | | This is simple but won't work if/when this pass is moved to be post-SSA. llvm-svn: 245134
* AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRangesMatt Arsenault2015-08-151-4/+13
| | | | | | | | | Does not mark SlotIndexes as reserved, although I think that might be OK. LiveVariables still need to be handled. llvm-svn: 245133
* AMDGPU: Remove unnecessary assertMatt Arsenault2015-08-151-1/+1
| | | | | | | These shouldn't ever be null. The number of successors was already asserted to be 2. llvm-svn: 245132
OpenPOWER on IntegriCloud