summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4Marek Olsak2017-11-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Only constant offsets (*_IMM opcodes) are merged. It reuses code for LDS load/store merging. It relies on the scheduler to group loads. The results are mixed, I think they are mostly positive. Most shaders are affected, so here are total stats only: SGPRS: 2072198 -> 2151462 (3.83 %) VGPRS: 1628024 -> 1634612 (0.40 %) Spilled SGPRs: 7883 -> 8942 (13.43 %) Spilled VGPRs: 97 -> 101 (4.12 %) Scratch size: 1488 -> 1492 (0.27 %) dwords per thread Code Size: 60222620 -> 52940672 (-12.09 %) bytes Max Waves: 374337 -> 373066 (-0.34 %) There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more VGPRs (now 37), but 12% decrease in code size. These are the new stats for SGPR spilling. We already spill a lot SGPRs, so it's uncertain whether more spilling will make any difference since SGPRs are always spilled to VGPRs: SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh alien_isolation 2938 100 0.0 batman_arkham_origins 589 6 0.0 bioshock-infinite 1769 4 0.0 borderlands2 3968 22 0.0 counter_strike_glob.. 1142 60 0.1 deus_ex_mankind_div.. 1410 79 0.1 dirt-showdown 533 4 0.0 dirt_rally 364 1163 3.2 divinity 1052 2 0.0 dota2 1747 7 0.0 f1-2015 776 1515 2.0 grid_autosport 1767 1505 0.9 hitman 1413 273 0.2 left_4_dead_2 1762 4 0.0 life_is_strange 1296 26 0.0 mad_max 358 96 0.3 metro_2033_redux 2670 60 0.0 payday2 1362 22 0.0 portal 474 3 0.0 saints_row_iv 1704 8 0.0 serious_sam_3_bfe 392 1348 3.4 shadow_of_mordor 1418 12 0.0 shadow_warrior 3956 239 0.1 talos_principle 324 1735 5.4 thea 172 17 0.1 tomb_raider 1449 215 0.1 total_war_warhammer 242 56 0.2 ue4_effects_cave 295 55 0.2 ue4_elemental 572 12 0.0 unigine_tropics 210 56 0.3 unigine_valley 278 152 0.5 victor_vran 1262 84 0.1 yofrankie 82 2 0.0 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38949 llvm-svn: 317751
* AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32Matt Arsenault2017-11-061-0/+4
| | | | llvm-svn: 317492
* AMDGPU: Add max-mix-insts subtarget featureMatt Arsenault2017-10-251-1/+2
| | | | llvm-svn: 316553
* AMDGPU: Fix default range in non-kernel functionsMatt Arsenault2017-10-231-0/+3
| | | | | | | | | The range should be assumed to be the hardware maximum if a workitem intrinsic is used in a callable function which does not know the restricted limit of the calling kernel. llvm-svn: 316346
* AMDGPU: Do not emit deprecated notes for code object v3Konstantin Zhuravlyov2017-10-141-0/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D38749 llvm-svn: 315810
* [Triple] Add AMDPAL operating system typeTim Renouf2017-09-291-0/+4
| | | | | | | | | | | | | | | | | | Summary: This operating system type represents the AMDGPU PAL runtime, and will be required by the AMDGPU backend in order to generate correct code for this runtime. Currently it generates the same code as not specifying an OS at all. That will change in future commits. Patch from Tim Corringham. Subscribers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D37380 llvm-svn: 314500
* [AMDGPU] Prevent post-RA scheduler from breaking memory clausesStanislav Mekhanoshin2017-09-191-0/+4
| | | | | | | | | The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670
* AMDGPU: Start selecting v_mad_mix_f32Matt Arsenault2017-09-071-0/+4
| | | | llvm-svn: 312732
* AMDGPU: Add most d16 load/store instruction definitionsMatt Arsenault2017-09-011-0/+4
| | | | | | | Doesn't include the tied operand necessary for the loads, but is enough for the assembler to work. llvm-svn: 312347
* [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodesDmitry Preobrazhensky2017-08-161-0/+5
| | | | | | | | | | See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36674 llvm-svn: 311006
* Reapply "[GlobalISel] Remove the GISelAccessor API."Quentin Colombet2017-08-151-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r310425, thus reapplying r310335 with a fix for link issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON. Original commit message: [GlobalISel] Remove the GISelAccessor API. Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. ---- The fix for the link issue consists in adding the GlobalISel library in the list of dependencies for the AArch64 unittests. This dependency comes from the use of AArch64Subtarget that needs to know how to destruct the GISel related APIs when being detroyed. Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and understand the problem. llvm-svn: 310969
* Revert "[GlobalISel] Remove the GISelAccessor API."Quentin Colombet2017-08-081-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r310115. It causes a linker failure for the one of the unittests of AArch64 on one of the linux bot: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429 : && /home/fedora/gcc/install/gcc-7.1.0/bin/g++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -O2 -L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined -Wl,-O3 -Wl,--gc-sections unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o -o unittests/Target/AArch64/AArch64Tests lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread -Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && : unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0): undefined reference to `vtable for llvm::LegalizerInfo' unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8): undefined reference to `vtable for llvm::RegisterBankInfo' The particularity of this bot is that it is built with BUILD_SHARED_LIBS=ON However, I was not able to reproduce the problem so far. Reverting to unblock the bot. llvm-svn: 310425
* AMDGPU: Cleanup subtarget featuresMatt Arsenault2017-08-071-1/+0
| | | | | | | | | | | | Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258
* [GlobalISel] Remove the GISelAccessor API.Quentin Colombet2017-08-041-14/+14
| | | | | | | | | | Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. llvm-svn: 310115
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-281-1/+2
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* AMDGPU: Add encoding for carryless add/sub instructionsMatt Arsenault2017-07-201-0/+5
| | | | llvm-svn: 308639
* [AMDGPU] fcaninicalize optimization for GFX9+Stanislav Mekhanoshin2017-07-131-0/+4
| | | | | | | | | | | | | | Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that is possible to further optimize fcanonicalize and remove it if applied to min/max given their operands are known not to be an sNaN or that sNaNs are not supported. Additionally we can remove fcanonicalize if denorms are supported for the VT and we know that its argument is never a NaN. Differential Revision: https://reviews.llvm.org/D35335 llvm-svn: 307976
* [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructionsSam Kolton2017-06-271-3/+3
| | | | | | | | | | | | | | Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413
* [AMDGPU] SDWA: add support for GFX9 in peephole passSam Kolton2017-06-221-0/+25
| | | | | | | | | | | | | | | | Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986
* AMDGPU : Fix ISA Version Definitions.Wei Ding2017-06-101-1/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [llvm] Remove double semicolonsMandeep Singh Grang2017-06-061-1/+1
| | | | | | | | | | | | Reviewers: craig.topper, arsenm, mehdi_amini Reviewed By: mehdi_amini Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33924 llvm-svn: 304767
* AMDGPU: Make auto waitcnt before barrier a featureKonstantin Zhuravlyov2017-06-021-6/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571
* [AMDGPU] Fix kernel arg segment size for amdgizclYaxun Liu2017-06-011-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33307 llvm-svn: 304482
* [AMDGPU] Require waitcnt before barrier for all targets; adjust tests.Mark Searles2017-05-301-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33576 llvm-svn: 304217
* Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex ↵Marek Olsak2017-05-241-4/+0
| | | | | | | | | | | patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754
* [AMDGPU] Combine and (srl) into shl (bfe)Stanislav Mekhanoshin2017-05-231-4/+4
| | | | | | | | | | | | | | | | | | | Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681
* AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patternsMarek Olsak2017-05-231-0/+4
| | | | | | | | | | | | This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658
* AMDGPU: Fix min3/max3 combines for f16/i16Matt Arsenault2017-05-171-0/+4
| | | | | | Fix missing instruction definitions for min3/max3. llvm-svn: 303284
* AMDGPU: Add new subtarget features for gfx9 flat instructionsMatt Arsenault2017-05-101-0/+15
| | | | | | | Flat instructions gain an immediate offset, and 2 new sets of segment specific flat instructions are added. llvm-svn: 302729
* AMDGPU: Change stack alignmentMatt Arsenault2017-04-171-2/+4
| | | | | | | | | While the incoming stack for a kernel is 256-byte aligned, this refers to the base address of the entire wave. This isn't useful information for most of codegen. Fixes unnecessarily aligning stack objects in callees. llvm-svn: 300481
* [AMDGPU] Generate range metadata for workitem idStanislav Mekhanoshin2017-04-121-0/+3
| | | | | | | | | If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-0/+5
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Always use VGPR indexing on GFX9Marek Olsak2017-03-211-0/+4
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396
* [AMDGPU] Iterative scheduling infrastructure + minimal registry schedulerValery Pykhtin2017-03-211-0/+6
| | | | | | Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-271-0/+4
| | | | llvm-svn: 296401
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-271-0/+5
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-6/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-221-10/+6
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-6/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-4/+9
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Fix assembler subtarget predicate for gfx9Matt Arsenault2017-02-181-0/+1
| | | | | | This was accepting GFX9 instructions on VI. llvm-svn: 295557
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-1/+23
| | | | llvm-svn: 295554
* AMDGPU : Add trap handler support.Wei Ding2017-02-101-0/+25
| | | | | | Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
* [AMDGPU] Add target information that is required by tools to metadataKonstantin Zhuravlyov2017-02-081-50/+43
| | | | | | Differential Revision: https://reviews.llvm.org/D28760#fb670e28 llvm-svn: 294449
* [AMDGPU] Distinguish between S/VGPR allocation and encoding granularitiesKonstantin Zhuravlyov2017-02-081-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D29633 llvm-svn: 294441
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-1/+79
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-011-2/+3
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* AMDGPU: Cleanup fmin/fmax legacy functionMatt Arsenault2017-02-011-0/+4
| | | | | | Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
* AMDGPU: Implement hook for InferAddressSpacesMatt Arsenault2017-01-311-4/+4
| | | | | | | | | | For now just port some of the existing NVPTX tests and from an old HSAIL optimization pass which approximately did the same thing. Don't enable the pass yet until more testing is done. llvm-svn: 293580
OpenPOWER on IntegriCloud