summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructionsSam Kolton2017-06-271-3/+3
| | | | | | | | | | | | | | Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413
* [AMDGPU] SDWA: add support for GFX9 in peephole passSam Kolton2017-06-221-0/+25
| | | | | | | | | | | | | | | | Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986
* AMDGPU : Fix ISA Version Definitions.Wei Ding2017-06-101-1/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [llvm] Remove double semicolonsMandeep Singh Grang2017-06-061-1/+1
| | | | | | | | | | | | Reviewers: craig.topper, arsenm, mehdi_amini Reviewed By: mehdi_amini Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33924 llvm-svn: 304767
* AMDGPU: Make auto waitcnt before barrier a featureKonstantin Zhuravlyov2017-06-021-6/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571
* [AMDGPU] Fix kernel arg segment size for amdgizclYaxun Liu2017-06-011-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33307 llvm-svn: 304482
* [AMDGPU] Require waitcnt before barrier for all targets; adjust tests.Mark Searles2017-05-301-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33576 llvm-svn: 304217
* Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex ↵Marek Olsak2017-05-241-4/+0
| | | | | | | | | | | patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754
* [AMDGPU] Combine and (srl) into shl (bfe)Stanislav Mekhanoshin2017-05-231-4/+4
| | | | | | | | | | | | | | | | | | | Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681
* AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patternsMarek Olsak2017-05-231-0/+4
| | | | | | | | | | | | This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658
* AMDGPU: Fix min3/max3 combines for f16/i16Matt Arsenault2017-05-171-0/+4
| | | | | | Fix missing instruction definitions for min3/max3. llvm-svn: 303284
* AMDGPU: Add new subtarget features for gfx9 flat instructionsMatt Arsenault2017-05-101-0/+15
| | | | | | | Flat instructions gain an immediate offset, and 2 new sets of segment specific flat instructions are added. llvm-svn: 302729
* AMDGPU: Change stack alignmentMatt Arsenault2017-04-171-2/+4
| | | | | | | | | While the incoming stack for a kernel is 256-byte aligned, this refers to the base address of the entire wave. This isn't useful information for most of codegen. Fixes unnecessarily aligning stack objects in callees. llvm-svn: 300481
* [AMDGPU] Generate range metadata for workitem idStanislav Mekhanoshin2017-04-121-0/+3
| | | | | | | | | If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-0/+5
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Always use VGPR indexing on GFX9Marek Olsak2017-03-211-0/+4
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396
* [AMDGPU] Iterative scheduling infrastructure + minimal registry schedulerValery Pykhtin2017-03-211-0/+6
| | | | | | Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-271-0/+4
| | | | llvm-svn: 296401
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-271-0/+5
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-6/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-221-10/+6
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-6/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-4/+9
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Fix assembler subtarget predicate for gfx9Matt Arsenault2017-02-181-0/+1
| | | | | | This was accepting GFX9 instructions on VI. llvm-svn: 295557
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-1/+23
| | | | llvm-svn: 295554
* AMDGPU : Add trap handler support.Wei Ding2017-02-101-0/+25
| | | | | | Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
* [AMDGPU] Add target information that is required by tools to metadataKonstantin Zhuravlyov2017-02-081-50/+43
| | | | | | Differential Revision: https://reviews.llvm.org/D28760#fb670e28 llvm-svn: 294449
* [AMDGPU] Distinguish between S/VGPR allocation and encoding granularitiesKonstantin Zhuravlyov2017-02-081-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D29633 llvm-svn: 294441
* [AMDGPU] Move register related queries to subtarget classKonstantin Zhuravlyov2017-02-081-1/+79
| | | | | | Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-011-2/+3
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* AMDGPU: Cleanup fmin/fmax legacy functionMatt Arsenault2017-02-011-0/+4
| | | | | | Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
* AMDGPU: Implement hook for InferAddressSpacesMatt Arsenault2017-01-311-4/+4
| | | | | | | | | | For now just port some of the existing NVPTX tests and from an old HSAIL optimization pass which approximately did the same thing. Don't enable the pass yet until more testing is done. llvm-svn: 293580
* Re-commit AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-0/+15
| | | | | | | | | | | | | | Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551
* Revert "AMDGPU/GlobalISel: Add support for simple shaders"Tom Stellard2017-01-301-15/+0
| | | | | | | | This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
* AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-0/+15
| | | | | | | | | | | | Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
* AMDGPU: Enable FeatureFlatForGlobal on Volcanic IslandsMatt Arsenault2017-01-271-1/+0
| | | | | | | | | | | Accomplishes what r292982 was supposed to, which ended up only really making the necessary test changes. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 293310
* AMDGPU: Implement early ifcvt target hooks.Matt Arsenault2017-01-251-0/+5
| | | | | | | | | | | | Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
* AMDGPU add support for spilling to a user sgpr pointed buffersTom Stellard2017-01-251-7/+16
| | | | | | | | | | | | | | | | | Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000
* Enable FeatureFlatForGlobal on Volcanic IslandsMatt Arsenault2017-01-241-0/+1
| | | | | | | | | | | This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
* AMDGPU: Combine fp16/fp64 subtarget featuresMatt Arsenault2017-01-231-4/+3
| | | | | | | The same control register controls both, and are set to the same defaults. Keep the old names around as aliases. llvm-svn: 292837
* [AMDGPU] Add subtarget features for SDWA/DPPSam Kolton2017-01-201-0/+10
| | | | | | | | | | Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28900 llvm-svn: 292596
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-091-5/+12
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289282
* [AMDGPU] Scalarization of global uniform loads.Alexander Timofeev2016-12-081-0/+4
| | | | | | | | | | | | | | | | | | Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-131-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32Konstantin Zhuravlyov2016-11-011-4/+4
| | | | | | | | | | | This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716
* AMDGPU: Use 1/2pi inline imm on VIMatt Arsenault2016-10-291-0/+5
| | | | | | I'm guessing at how it is supposed to be printed llvm-svn: 285490
* AMDGPU: Add definitions for scalar store instructionsMatt Arsenault2016-10-281-0/+5
| | | | | | | | | | Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463
* AMDGPU: Diagnose using too many SGPRsMatt Arsenault2016-10-281-0/+2
| | | | | | This is possible when using inline asm. llvm-svn: 285447
* AMDGPU/SI: Handle hazard with > 8 byte VMEM storesTom Stellard2016-10-271-0/+4
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25577 llvm-svn: 285359
OpenPOWER on IntegriCloud