summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Fix RegBankSelect for GEP.Matt Arsenault2019-02-141-0/+90
| | | | | | | | | | This is basically a pointer typed add, so shouldn't be any different. This was assuming everything was an SGPR, which is not true. Also cleanup legality for GEP. I don't seem to be seeing the problem the hack marking s64 as a legal pointer type the comment mentions. llvm-svn: 354067
* [AMDGPU] Ressociate 'add (add x, y), z' to use SALUStanislav Mekhanoshin2019-02-142-2/+121
| | | | | | | | | | | Reassociate adds to collect scalar operands in a single instruction when possible. That will result in a scalar add followed by vector instead of two vector adds, thus better utilizing SALU. Differential Revision: https://reviews.llvm.org/D58220 llvm-svn: 354066
* AMDGPU/GlobalISel: Handle split for 64-bit VALU selectMatt Arsenault2019-02-141-0/+400
| | | | llvm-svn: 354065
* [DebugInfo] Stop changing labels for register-described parameter DBG_VALUEsDavid Stenberg2019-02-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a follow-up to D57510. This patch stops DebugHandlerBase from changing the starting label for the first non-overlapping, register-described parameter DBG_VALUEs to the beginning of the function. That code did not consider what defined the registers, which could result in the ranges for the debug values starting before their defining instructions. We currently do not emit debug values for constant values directly at the start of the function, so this code is still useful for such values, but my intention is to remove the code from DebugHandlerBase completely when we get there. One reason for removing it is that the code violates the history map's ranges, which I think can make it quite confusing when troubleshooting. In D57510, PrologEpilogInserter was amended so that parameter DBG_VALUEs now are kept at the start of the entry block, even after emission of prologue code. That was done to reduce the degradation of debug completeness from this patch. PR40638 is another example, where the lexical-scope trimming that LDV does, in combination with scheduling, results in instructions after the prologue being left without locations. There might be other cases where the DBG_VALUEs are pushed further down, for which the DebugHandlerBase code may be helpful, but as it now quite often result in incorrect locations, even after the prologue, it seems better to remove that code, and try to work our way up with accurate locations. In the long run we should maybe not aim to provide accurate locations inside the prologue. Some single location descriptions, at least those referring to stack values, generate inaccurate values inside the epilogue, so we maybe should not aim to achieve accuracy for location lists. However, it seems that we now emit line number programs that can result in GDB and LLDB stopping inside the prologue when doing line number stepping into functions. See PR40188 for more information. A summary of some of the changed test cases is available in PR40188#c2. Reviewers: aprantl, dblaikie, rnk, jmorse Reviewed By: aprantl Subscribers: jdoerfert, jholewinski, jvesely, javed.absar, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D57511 llvm-svn: 353928
* AMDGPU/GlobalISel: Add more insert/extract testcasesMatt Arsenault2019-02-122-115/+918
| | | | llvm-svn: 353848
* AMDGPU/GlobalISel: Only make f16 constants legal on f16 targetsMatt Arsenault2019-02-121-12/+23
| | | | | | We could deal with it, but there's no real point. llvm-svn: 353845
* GlobalISel: Verify G_EXTRACTMatt Arsenault2019-02-111-2/+2
| | | | llvm-svn: 353759
* GlobalISel: Implement moreElementsVector for implicit_defMatt Arsenault2019-02-1118-146/+296
| | | | llvm-svn: 353754
* GlobalISel: Add G_FCANONICALIZE instructionMatt Arsenault2019-02-111-0/+286
| | | | llvm-svn: 353719
* [AMDGPU] fix atomic_optimizations_buffer.ll test after DPP combiner was ↵Valery Pykhtin2019-02-111-2/+2
| | | | | | | | enabled by default. Related commits: rL353691, rL353703. llvm-svn: 353717
* [AMDGPU] Fix DPP sequence in atomic optimizer.Neil Henning2019-02-116-31/+48
| | | | | | | | | | This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug by using a ballot instead. Differential Revision: https://reviews.llvm.org/D57737 llvm-svn: 353703
* [AMDGPU] Split idot4/8 signed and unsigned tests. NFC.Stanislav Mekhanoshin2019-02-094-3107/+3117
| | | | llvm-svn: 353593
* AMDGPU/GlobalISel: Fix broken testsMatt Arsenault2019-02-082-0/+31
| | | | llvm-svn: 353559
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-1/+1
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* [AMDGPU] Fix CS scratch setup on pre-GCN3 ASICsCarl Ritson2019-02-081-2/+5
| | | | | | | | | | | | | | | | Summary: Prior to GCN3 s_load_dword offsets are in dwords rather than bytes. Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: sheredom, arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56496 llvm-svn: 353530
* AMDGPU/GlobalISel: Fix shift legalization for non-power-of-2Matt Arsenault2019-02-081-0/+85
| | | | | | | | clampScalar doesn't do anything for non-power-of-2 in range. There should probably be a combination rule to reduce the number of matching rules. llvm-svn: 353526
* AMDGPU/GlobalISel: Fix non-power-of-2 implicit_defMatt Arsenault2019-02-081-0/+28
| | | | llvm-svn: 353522
* [MIPS GlobalISel] Select any extending load and truncating storePetar Avramovic2019-02-082-68/+153
| | | | | | | | | | | | | | | | | | Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and G_SEXTLOAD. That is perform widenScalarDst to size given by the target and avoid additional checks in common code. Targets can reorder or add additional rules in LegalizeRuleSet for the opcode to achieve desired behavior. Select extending load that does not have specified type of extension into zero extending load. Select truncating store that stores number of bytes indicated by size in MachineMemoperand. Differential Revision: https://reviews.llvm.org/D57454 llvm-svn: 353520
* AMDGPU/GlobalISel: Don't use a copy in addrspacecast loweringMatt Arsenault2019-02-081-36/+36
| | | | llvm-svn: 353516
* [AMDGPU] Fix DPP combinerValery Pykhtin2019-02-083-328/+472
| | | | | | | | | | | | | Differential revision: https://reviews.llvm.org/D55444 dpp move with uses and old reg initializer should be in the same BB. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Otherwise the old register value is checked for identity. Added add, subrev, and, or instructions to the old folding function. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. The pass is still disabled by default. llvm-svn: 353513
* AMDGPU/GlobalISel: Legalize addrspacecastMatt Arsenault2019-02-081-0/+393
| | | | | | | Use a placeholder constant for now on targets that need the load from the queue ptr. llvm-svn: 353497
* [CodeGen] Handle vector UADDO, SADDO, USUBO, SSUBONikita Popov2019-02-074-4/+85
| | | | | | | | | | | | | | | This is part of https://bugs.llvm.org/show_bug.cgi?id=40442. Vector legalization is implemented for the add/sub overflow opcodes. UMULO/SMULO are also handled as far as legalization is concerned, but they don't support vector expansion yet (so no tests for them). The vector result widening implementation is suboptimal, because it could result in a legalization loop. Differential Revision: https://reviews.llvm.org/D57639 llvm-svn: 353464
* GlobalISel: Implement narrowScalar for shift main typeMatt Arsenault2019-02-075-231/+2873
| | | | | | | | | | | | | | | This is pretty much directly ported from SelectionDAG. Doesn't include the shift by non-constant but known bits version, since there isn't a globalisel version of computeKnownBits yet. This shows a disadvantage of targets not specifically which type should be used for the shift amount. If type 0 is legalized before type 1, the operations on the shift amount type use the wider type (which are also less likely to legalize). This can be avoided by targets specifying legalization actions on type 1 earlier than for type 0. llvm-svn: 353455
* AMDGPU/GlobalISel: Restrict g_implicit_def legalityMatt Arsenault2019-02-073-43/+426
| | | | llvm-svn: 353452
* GlobalISel: Fix artifact combiner constant legality checks for vectorsMatt Arsenault2019-02-074-17/+260
| | | | | | | Since G_CONSTANT is illegal for vectors, this needs to check what buildConstant will produce for a splat vector. llvm-svn: 353449
* AMDGPU/GlobalISel: Don't use g_implicit_def in a few testsMatt Arsenault2019-02-073-54/+65
| | | | llvm-svn: 353443
* AMDGPU/GlobalISel: Legalize fsqrtMatt Arsenault2019-02-072-0/+323
| | | | llvm-svn: 353438
* AMDGPU/GlobalISel: Legalize some f16 operationsMatt Arsenault2019-02-076-613/+213
| | | | llvm-svn: 353436
* GlobalISel: Implement fewerElementsVector for shiftsMatt Arsenault2019-02-073-108/+1263
| | | | | | | | | Introduce a new function which handles instructions with multiple type indices, but have the same number of vector elements. Also legalize v2s16 shifts when applicable. llvm-svn: 353432
* [AMDGPU] Consider XOR in waterfall loop as a terminatorScott Linder2019-02-051-0/+115
| | | | | | | | Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator. Differential Revision: https://reviews.llvm.org/D57703 llvm-svn: 353207
* AMDGPU: Fix assert on trunc from bitcast of build_vectorMatt Arsenault2019-02-051-2/+18
| | | | | | | | | The v2i64 argument is lowered to a bitcast of v4i32 build_vector. This would then attempt to use the i32-element as the source of the vector truncate. This really would need to collect 2 elements from the build_vector to produce the intended truncate. llvm-svn: 353202
* GlobalISel: Consolidate load/store legalizationMatt Arsenault2019-02-052-51/+88
| | | | | | | | | | The fewerElementsVectors implementation for load/stores handles the scalar reduction case just as well, so drop the redundant code in narrowScalar. This also introduces support for narrowing irregular size breakdowns for scalars. llvm-svn: 353125
* GlobalISel: Implement narrowScalar for selectMatt Arsenault2019-02-051-2/+143
| | | | | | | | | | Don't handle vector conditions. I think this can be merged in the future with fewerElementsVectorSelect, although this becomes slightly tricky with a vector condition. llvm-svn: 353122
* GlobalISel: Combine g_extract with g_merge_valuesMatt Arsenault2019-02-041-0/+470
| | | | | | | | | | | | | | Try to use the underlying source registers. This enables legalization in more cases where some irregular operations are widened and others narrowed. This seems to make the test_combines_2 AArch64 test worse, since the MERGE_VALUES has multiple uses. Since this should be required for legalization, a hasOneUse check is probably inappropriate (or maybe should only be used if the merge is legal?). llvm-svn: 353121
* GlobalISel: Enforce operand types for constantsMatt Arsenault2019-02-0413-46/+47
| | | | | | | | A number of of tests were using imm operands, not cimm. Since CSE relies on the exact ConstantInt* pointer used, and implicit conversions are generally evil, also enforce the bitsize of the types. llvm-svn: 353113
* GlobalISel: Fix not calling observer when legalizing bitcount opsMatt Arsenault2019-02-045-20/+41
| | | | | | This was hiding bugs from never legalizing the source type. llvm-svn: 353102
* AMDGPU: Don't rematerialize mov with implicit operandsMatt Arsenault2019-02-041-0/+112
| | | | | | | This was pulling the mov used for register indexing on gfx9 out of the loop. llvm-svn: 353101
* [AMDGPU] Support emitting GOT relocations for function callsScott Linder2019-02-041-0/+51
| | | | | | Differential Revision: https://reviews.llvm.org/D57416 llvm-svn: 353083
* AMDGPU/GlobalISel: Legalize select for v4s16Matt Arsenault2019-02-041-1/+206
| | | | | | | Also add some more select tests to help show future legalization changes. llvm-svn: 353045
* GlobalISel: Implement widenScalar for G_UNMERGE_VALUESMatt Arsenault2019-02-031-1/+232
| | | | | | | | | For the scalar case only. Also move the similar G_MERGE_VALUES handling to a separate function and cleanup to make them look more similar. llvm-svn: 352979
* GlobalISel: Implement widenScalar for G_EXTRACT vector sourcesMatt Arsenault2019-02-021-0/+132
| | | | | | Handle the basic element extract case. llvm-svn: 352978
* AMDGPU/GlobalISel: Legalize icmp for pointer typesMatt Arsenault2019-02-021-0/+175
| | | | llvm-svn: 352976
* AMDGPU/GlobalISel: Legalize constant for pointer typesMatt Arsenault2019-02-021-0/+84
| | | | llvm-svn: 352975
* AMDGPU/GlobalISel: Legalize select for pointer typesMatt Arsenault2019-02-021-76/+514
| | | | llvm-svn: 352974
* GlobalISel: Legalization for inttoptr/ptrtointMatt Arsenault2019-02-022-28/+323
| | | | llvm-svn: 352973
* [AMDGPU] Mark test functions with hidden visibilityScott Linder2019-02-018-70/+70
| | | | | | | | | Prepare for future patch which affects codegen for calls to preemptible functions. Differential Revision: https://reviews.llvm.org/D57605 llvm-svn: 352920
* [opaque pointer types] Pass value type to LoadInst creation.James Y Knight2019-02-011-16/+12
| | | | | | | | | This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911
* [AMDGPU] Fix for vector element insertionTim Corringham2019-02-016-34/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Incorrect code was generated when lowering insertelement operations for vectors with 8 or 16 bit elements. The value being inserted was not adjusted for the position of the element within the 32 bit word and so only the low element within each 32 bit word could receive the intended value. Fixed by simply replicating the value to each element of a congruent vector before the mask and or operation used to update the intended element. A number of affected LIT tests have been updated appropriately. before the mask & or into the intended Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Tags: #llvm Differential Revision: https://reviews.llvm.org/D57588 llvm-svn: 352885
* GlobalISel: Handle odd splits in fewerElementsVector for load/storeMatt Arsenault2019-01-312-123/+248
| | | | llvm-svn: 352720
* GlobalISel: Implement narrowScalar for bswapMatt Arsenault2019-01-311-0/+125
| | | | llvm-svn: 352719
OpenPOWER on IntegriCloud