summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* Test commit accessKonstantin Zhuravlyov2016-03-291-1/+1
| | | | llvm-svn: 264736
* AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructionsTom Stellard2016-03-281-8/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589
* AMDGPU: Fix a use-after free and a missing breakJustin Bogner2016-03-251-1/+2
| | | | | | | | | | | | | | | | We're erasing MI here, but then immediately using it again inside the `if`. This moves the erase after we're done using it. Doing that reveals a second problem though - this case is missing a break, so we fall through to the default and dereference MI again. This is obviously a bug, though I don't know how to write a test that triggers it - all we do in the error case is print some extra debug output. Both of these issue crash on lots of tests under ASAN with the recycling allocator changes from PR26808 applied. llvm-svn: 264442
* AMDGPU: Cost model for basic integer operationsMatt Arsenault2016-03-251-0/+31
| | | | | | | This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376
* AMDGPU: Partially implement getArithmeticInstrCost for FP opsMatt Arsenault2016-03-252-1/+94
| | | | llvm-svn: 264374
* AMDGPU: TTI: Make insertelement free.Matt Arsenault2016-03-251-0/+5
| | | | | | We don't want to have a cost to scalarizing operations. llvm-svn: 264364
* AMDGPU/SI: Add Polaris supportTom Stellard2016-03-241-0/+8
| | | | | | Patch By: Sonny Jiang llvm-svn: 264295
* Fix sequence point warning. NFC.Vasileios Kalintiris2016-03-241-1/+1
| | | | llvm-svn: 264255
* AMDGPU: Remove atomic inc/dec patternsMatt Arsenault2016-03-231-23/+0
| | | | | | | There is no benefit to these since materializing the constant 1 requires the same number of instructions as materializing uint_max llvm-svn: 264215
* AMDGPU: Promote alloca should skip volatilesMatt Arsenault2016-03-231-0/+13
| | | | llvm-svn: 264214
* AMDGPU: Insert moves of frame index to value operandsMatt Arsenault2016-03-231-0/+56
| | | | | | | | | | | | | | | | | | | | | | | Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. llvm-svn: 264200
* [AMDGPU] Fix missing assembler predicates.Valery Pykhtin2016-03-231-0/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D18351 llvm-svn: 264137
* AMDGPU: Cache information about register pressure setsTom Stellard2016-03-232-24/+37
| | | | | | | | | | We can statically decide whether or not a register pressure set is for SGPRs or VGPRs, so we don't need to re-compute this information in SIRegisterInfo::getRegPressureSetLimit(). Differential Revision: http://reviews.llvm.org/D14805 llvm-svn: 264126
* AMDGPU: Fix dangling references introduced by r263982Nicolai Haehnle2016-03-211-3/+5
| | | | | | | Fixes Valgrind errors on the test cases that were reported as failing by buildbots. llvm-svn: 264000
* AMDGPU: Coding style fixesNicolai Haehnle2016-03-211-4/+2
| | | | | | | I meant to add these before committing r263982 as per the review, but I forgot to squash. llvm-svn: 263983
* AMDGPU: Add SIWholeQuadMode passNicolai Haehnle2016-03-219-15/+515
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982
* AMDGPU/SI: Fix threshold calculation for branching when exec is zeroTom Stellard2016-03-211-3/+5
| | | | | | | | | | | | | | | | | | | Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 llvm-svn: 263969
* AMDGPU: Remove SignBitIsZero for mubuf scratch offsetsMatt Arsenault2016-03-211-1/+1
| | | | | | | These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964
* AMDGPU: Add frexp_mant intrinsicMatt Arsenault2016-03-211-2/+2
| | | | llvm-svn: 263948
* AMDGPU: add missing braces around multi-line if blockNicolai Haehnle2016-03-181-1/+2
| | | | | | This fixes an issue with rL263658 pointed out by Tom Stellard. llvm-svn: 263823
* AMDGPU: Overload return type of llvm.amdgcn.buffer.load.formatNicolai Haehnle2016-03-181-36/+43
| | | | | | | | | | | | | | | | Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 llvm-svn: 263792
* AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsicsNicolai Haehnle2016-03-183-2/+187
| | | | | | | | | | | | | | | | | | | | Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791
* AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.formatNicolai Haehnle2016-03-183-13/+110
| | | | | | | | | | | | | | | | | | | | | Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790
* [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3Sam Kolton2016-03-182-50/+95
| | | | | Review: http://reviews.llvm.org/D18267 llvm-svn: 263789
* AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermuteChangpeng Fang2016-03-171-1/+1
| | | | | | | | | | | | | | | | Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 llvm-svn: 263720
* AMDGPU: mark atomic instructions as sources of divergenceNicolai Haehnle2016-03-171-0/+7
| | | | | | | | | | | | | | Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719
* AMDGPU: Prevent uniform loops from becoming infiniteNicolai Haehnle2016-03-161-0/+6
| | | | | | | | | | | | | | Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 llvm-svn: 263658
* AMDGPU: Verify instructions in non-debug builds as wellMichel Danzer2016-03-161-3/+3
| | | | | | | | | | | | And emit an error if it fails. This prevents illegal instructions from getting sent to the GPU, which would potentially result in a hang. This is a candidate for the stable branch(es). Reviewed-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 263627
* AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormatMichel Danzer2016-03-161-3/+3
| | | | | Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 263626
* AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDSChangpeng Fang2016-03-152-2/+19
| | | | | | | | | | | | | | | | | | | Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 llvm-svn: 263563
* [DAG] use isUndef() ; NFCISanjay Patel2016-03-141-4/+4
| | | | llvm-svn: 263448
* AMDGPU/SI: Handle wait states required for DPP instructionsTom Stellard2016-03-142-0/+63
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447
* AMDGPU/SI: Incomplete shader binaries need to finish execution at the endMarek Olsak2016-03-142-8/+24
| | | | | | | | | | Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441
* AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergenceNicolai Haehnle2016-03-141-0/+13
| | | | | | | | | | | | | | Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440
* [AMDGPU] Assembler: SOP* instruction fixesNikolay Haustov2016-03-142-27/+40
| | | | | | | | | | | | | | s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit. s_rfe_b64 has just one destination operand and no source. Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that. Add s_memrealtime test and change comments in smem.s to follow common style. Change test for s_memtime to use non-zero register to make it really test encoding. Add tests for s_buffer_load*. Add tests for SOPC instructions (same for SI and VI) Differential Revision: http://reviews.llvm.org/D18040 llvm-svn: 263420
* [AMDGPU] AsmParser: Factor out parseRegister. NFC.Valery Pykhtin2016-03-141-24/+40
| | | | llvm-svn: 263411
* [AMDGPU] AsmParser: refactor post push_back vector access. NFC.Valery Pykhtin2016-03-141-6/+5
| | | | llvm-svn: 263409
* [AMDGPU] AsmParser: remove redundant isReg checks. NFC.Valery Pykhtin2016-03-141-7/+7
| | | | llvm-svn: 263407
* [AMDGPU] Fix VOPC instruction operand namingsValery Pykhtin2016-03-111-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D17966 llvm-svn: 263242
* [AMDGPU] Assembler: change v_madmk operands to have same order as mad.Nikolay Haustov2016-03-114-28/+19
| | | | | | | | | | The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 llvm-svn: 263212
* AMDGPU: Don't use InstVisitor for AMDGPUPromoteAllocaMatt Arsenault2016-03-111-6/+12
| | | | | | | | Frontend authors are strongly encouraged to keep allocas in the entry block, so don't bother visiting every instruction in the other blocks of the function. llvm-svn: 263206
* AMDGPU: R600 code splitting cleanupMatt Arsenault2016-03-1132-105/+93
| | | | | | | Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204
* AMDGPU: Materialize sign bits with bfrevMatt Arsenault2016-03-111-0/+24
| | | | | | | If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. llvm-svn: 263201
* AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsicsNicolai Haehnle2016-03-102-15/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140
* AMDGPU/SI: Define S_GETREG IntrinsicChangpeng Fang2016-03-101-0/+12
| | | | | | | | | | | | | | Summary: Define s_getreg intrinsic to generate s_getreg instruction to read hardware registers. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17892 llvm-svn: 263124
* [AMDGPU] Fix SMEM instructions encoding/operand namingsValery Pykhtin2016-03-105-37/+91
| | | | | | Differential Revision: http://reviews.llvm.org/D17651 llvm-svn: 263108
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-093-5/+6
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* Fix build error due to unsigned compare >= 0 in r263008 (NFC)Teresa Johnson2016-03-091-1/+1
| | | | | | | | | | | | Fixes error from building with clang: /usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12: error: comparison of unsigned expression >= 0 is always true [-Werror,-Wtautological-compare] if ((Imm >= 0x000) && (Imm <= 0x0ff)) { ~~~ ^ ~~~~~ llvm-svn: 263014
* [AMDGPU] Assembler: Support DPP instructions.Sam Kolton2016-03-098-46/+350
| | | | | | | | | | | | | | | | | | | | Supprot DPP syntax as used in SP3 (except several operands syntax). Added dpp-specific operands in td-files. Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter. Support for VOP2 DPP instructions in td-files. Some tests for DPP instructions. ToDo: - VOP2bInst: - vcc is considered as operand - AsmMatcher doesn't apply mnemonic aliases when parsing operands - v_mac_f32 - v_nop - disable instructions with 64-bit operands - change dpp_ctrl assembler representation to conform sp3 Review: http://reviews.llvm.org/D17804 llvm-svn: 263008
* [AMDGPU] Assembler: Support abs() syntax.Nikolay Haustov2016-03-091-2/+19
| | | | | | | | | Support legacy SP3 abs(v1) syntax. InstPrinter still uses |v1|. Add tests. Differential Revision: http://reviews.llvm.org/D17887 llvm-svn: 263006
OpenPOWER on IntegriCloud