summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Add readonly to InstrMapping functionsMatt Arsenault2015-09-241-1/+15
| | | | llvm-svn: 248474
* AMDGPU: Fix printing trailing whitespace for mubuf atomicsMatt Arsenault2015-09-241-1/+1
| | | | llvm-svn: 248472
* AMDGPU: Reduce number of copies emittedMatt Arsenault2015-09-241-5/+9
| | | | | | | | | | | | | Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467
* Untabify.NAKAMURA Takumi2015-09-225-8/+8
| | | | llvm-svn: 248264
* Reformat blank lines.NAKAMURA Takumi2015-09-224-9/+2
| | | | llvm-svn: 248263
* Reformat comment lines.NAKAMURA Takumi2015-09-222-4/+5
| | | | llvm-svn: 248262
* AMDGPU: Remove unnecessary checkMatt Arsenault2015-09-221-4/+0
| | | | | | | If the instruction doesn't have enough operands, it either shouldn't be marked as isCommutable or is malformed. llvm-svn: 248242
* AMDGPU: Move copy handling under switch like other instructionsMatt Arsenault2015-09-211-5/+10
| | | | llvm-svn: 248172
* Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type ↵Craig Topper2015-09-211-1/+1
| | | | | | extra times. NFC llvm-svn: 248140
* Don't pass StringRefs around by const reference. Pass by value instead per ↵Craig Topper2015-09-211-1/+1
| | | | | | coding standards. NFC llvm-svn: 248136
* AMDGPU: Remove dead codeMatt Arsenault2015-09-195-18/+2
| | | | | | | getCFGStructurizerRegClass is not used for SI, so move it into R600 specific stuff. llvm-svn: 248087
* constify the Function parameter to the TTI creation callback andEric Christopher2015-09-161-1/+1
| | | | | | propagate to all callers/users/etc. llvm-svn: 247864
* propagate fast-math-flags on DAG nodesSanjay Patel2015-09-163-3/+23
| | | | | | | | | | | | | | | | | | | After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815
* Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and ↵Daniel Sanders2015-09-156-21/+21
| | | | | | | | related. NFC. Eric has replied and has demanded the patch be reverted. llvm-svn: 247702
* Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* ↵Daniel Sanders2015-09-156-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Thanks go to Pavel Labath for fixing LLDB for me. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247692
* Revert r247684 - Replace Triple with a new TargetTuple ...Daniel Sanders2015-09-156-21/+21
| | | | | | LLDB needs to be updated in the same commit. llvm-svn: 247686
* Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.Daniel Sanders2015-09-156-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247683
* Fix typos.Bruce Mitchener2015-09-121-1/+1
| | | | | | | | | | Summary: This fixes a variety of typos in docs, code and headers. Subscribers: jholewinski, sanjoy, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12626 llvm-svn: 247495
* Pass BranchProbability/BlockMass by value instead of const& as they are ↵Cong Hou2015-09-102-6/+6
| | | | | | small. NFC. llvm-svn: 247357
* AMDGPU: Simplify debug printingMatt Arsenault2015-09-103-13/+8
| | | | llvm-svn: 247345
* AMDGPU: Use StringRef valueMatt Arsenault2015-09-101-1/+1
| | | | llvm-svn: 247344
* AMDGPU/SI: Fix more cases of losing exec operandsMatt Arsenault2015-09-103-16/+12
| | | | llvm-svn: 247230
* AMDGPU/SI: Fix creating v_mov_b32s without exec usesMatt Arsenault2015-09-101-2/+14
| | | | | | | This will be caught by existing tests with a verifier check to be added in a future commit. llvm-svn: 247229
* AMDGPU: Extract full 64-bit subregister and use subregsMatt Arsenault2015-09-091-35/+29
| | | | | | | | | | | | Instead of extracting both 32-bit components from the 128-bit register. This produces fewer copies and is easier for the copy peephole optimizer to understand and see the actual uses as extracts from a reg_sequence. This avoids needing to handle subregister composing in the PeepholeOptimizer's ValueTracker for this case. llvm-svn: 247162
* AMDGPU: Remove unused multiclass argumentMatt Arsenault2015-09-091-5/+4
| | | | llvm-svn: 247161
* AMDGPU/SI: Fold operands through REG_SEQUENCE instructionsTom Stellard2015-09-091-0/+21
| | | | | | | | | | | | | | Summary: This helps mostly when we use add instructions for address calculations that contain immediates. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12256 llvm-svn: 247157
* AMDGPU: Fix not encoding src2 of VOP3b instructionsMatt Arsenault2015-09-091-4/+4
| | | | | | | Broken by r247074. Should include an assembler test, but the assembler is currently broken for VOP3b apparently. llvm-svn: 247123
* SelectionDAG: Support Expand of f16 extloadsMatt Arsenault2015-09-091-29/+3
| | | | | | | | | | Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113
* AMDGPU/SI: Fix input vcc operand for VOP2b instructionsMatt Arsenault2015-09-084-41/+57
| | | | | | | | | Adds vcc to output string input for e32. Allows option of using e64 encoding with assembler. Also fixes these instructions not implicitly reading exec. llvm-svn: 247074
* AMDGPU: Mark s_barrier as a high latency instructionMatt Arsenault2015-09-082-0/+3
| | | | | | | | | | | These were marked as WriteSALU, which is low latency. I'm guessing at the value to use, but it should probably be considered the highest latency instruction. I'm not sure this has any actual effect since hasSideEffects probably is preventing any moving of these. llvm-svn: 247060
* AMDGPU: Fix s_barrier flagsMatt Arsenault2015-09-081-2/+1
| | | | | | | | This should be convergent. This is not a barrier in the isBarrier sense, nor hasCtrlDep. llvm-svn: 247059
* AMDGPU: Handle sub of constant for DS offset foldingMatt Arsenault2015-09-081-11/+62
| | | | | | | | | sub C, x - > add (sub 0, x), C for DS offsets. This is mostly to fix regressions that show up when SeparateConstOffsetFromGEP is enabled. llvm-svn: 247054
* check for fastness before merging in DAGCombiner::MergeConsecutiveStores() Sanjay Patel2015-09-031-1/+4
| | | | | | | | | | | | | | | | Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771
* AMDGPU: Fix adding redundant implicit operandsMatt Arsenault2015-09-011-11/+7
| | | | | | | These are already added during the MachineInstr construction, so this was adding the implicit registers twice. llvm-svn: 246525
* AMDGPU: Add sdst operand to VOP2b instructionsMatt Arsenault2015-08-292-20/+30
| | | | | | | | | | The VOP3 encoding of these allows any SGPR pair for the i1 output, but this was forced before to always use vcc. This doesn't yet try to use this, but does add the operand to the definitions so the main change is adding vcc to the output of the VOP2 encoding. llvm-svn: 246358
* AMDGPU: Set mem operands for spill instructionsMatt Arsenault2015-08-293-25/+55
| | | | llvm-svn: 246357
* AMDGPU: Fix dropping mem operands when moving to VALUMatt Arsenault2015-08-291-11/+12
| | | | | | | | | | | | | Without a memory operand, mayLoad or mayStore instructions are treated as hasUnorderedMemRef, which results in much worse scheduling. We really should have a verifier check that any non-side effecting mayLoad or mayStore has a memory operand. There are a few instructions (interp and images) which I'm not sure what / where to add these. llvm-svn: 246356
* AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediatesTom Stellard2015-08-291-1/+5
| | | | | | | | | | | | | | | Summary: We were assuming tha if the use operand had a sub-register that the immediate was 64-bits, but this was breaking the case of folding a 64-bit immediate into another 64-bit instruction. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12255 llvm-svn: 246354
* AMDGPU/SI: Factor operand folding code into its own functionTom Stellard2015-08-281-67/+79
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12254 llvm-svn: 246353
* AMDGPU: Delete dead codeMatt Arsenault2015-08-263-68/+4
| | | | | | | | | | | | | | | | | There is no context where s_mov_b64 is emitted and could potentially be moved to the VALU. It is currently only emitted for materializing immediates, which can't be dependent on vector sources. The immediate splitting is already done when selecting constants. I'm not sure what contexts if any the register splitting would have been used before. Also clean up using s_mov_b64 in place of v_mov_b64_pseudo, although this isn't required and just skips the extra step of eliminating the copy from the SReg_64. llvm-svn: 246080
* AMDGPU: Don't reprocess instructions when splitting i64 bcntMatt Arsenault2015-08-261-4/+5
| | | | llvm-svn: 246079
* AMDGPU: Fix not moving users of s_bfe_i64 to VALUMatt Arsenault2015-08-261-0/+2
| | | | | | | This wouldn't propagate to users of the original BFE and would hit a verifier error. llvm-svn: 246078
* AMDGPU: Don't create intermediate SALU instructionsMatt Arsenault2015-08-262-27/+44
| | | | | | | | | | | | When splitting 64-bit operations, create the correct VALU instructions immediately. This was splitting things like s_or_b64 into the two s_or_b32s and then pushing the new instructions onto the worklist. There's no reason we need to do this intermediate step. llvm-svn: 246077
* AMDGPU/SI: Report SIFixSGPRLiveRanges changed functionMatt Arsenault2015-08-261-1/+4
| | | | llvm-svn: 246056
* AMDGPU: Make sure to reserve super registersMatt Arsenault2015-08-262-16/+18
| | | | | | | | I think this could potentially have broken if one of the super registers were allocated that contain v254/v255. llvm-svn: 246051
* AMDGPU: Produce error on dynamic_stackallocMatt Arsenault2015-08-263-0/+19
| | | | llvm-svn: 246048
* AMDGPU: Allow specifying different opcode on VI for SMRD/SMEMMatt Arsenault2015-08-222-15/+21
| | | | | | | | Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775
* AMDGPU: Improve accuracy of instruction rates for some FP instructionsMatt Arsenault2015-08-222-7/+27
| | | | llvm-svn: 245774
* AMDGPU: Use DFS to avoid second loop over functionMatt Arsenault2015-08-221-15/+13
| | | | llvm-svn: 245772
* AMDGPU: Make sure to run verifier after SIFixSGPRLiveRangesMatt Arsenault2015-08-221-1/+1
| | | | llvm-svn: 245769
OpenPOWER on IntegriCloud