summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Bring elf flags in sync with the specKonstantin Zhuravlyov2018-02-165-29/+85
| | | | | | | | | | | - Add MACH flags - Add XNACK flag - Add reserved flags - Minor cleanups in docs Differential Revision: https://reviews.llvm.org/D43356 llvm-svn: 325399
* AMDGPU: Bring processors and features in sync with the specKonstantin Zhuravlyov2018-02-164-18/+6
| | | | | | | | | | - Remove gfx800 - Make iceland gfx802 - Add xnack to gfx902 Differential Revision: https://reviews.llvm.org/D43355 llvm-svn: 325393
* AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elementsChangpeng Fang2018-02-161-1/+12
| | | | | | | | | | | | | | Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372
* AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested ↵Changpeng Fang2018-02-161-38/+23
| | | | | | | | | | | | | | | | | | instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355
* [AMDGPU] Combine adjacent waitcounts in a single strongest waitStanislav Mekhanoshin2018-02-151-7/+44
| | | | | | Differential Revision: https://reviews.llvm.org/D43350 llvm-svn: 325299
* [AMDGPU] Remove non-temporal flag from argument loadsStanislav Mekhanoshin2018-02-141-1/+0
| | | | | | | | | Kernel arguments likely read by all workitems and should not bypass cache. Fixes performance hit in sub-dword argument loads. Differential Revision: https://reviews.llvm.org/D43249 llvm-svn: 325146
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-136-9/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030
* [AMDGPUPromoteAlloca] Replace deprecated memory intrinsic APIs (NFCI)Daniel Neilson2018-02-091-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the AMDGPUPromoteAlloca pass to cease using: 1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. 2) The old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment() and [get|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324774
* AMDGPU: Remove tied operand from si_elseMatt Arsenault2018-02-091-1/+0
| | | | llvm-svn: 324751
* Reapply "AMDGPU: Add 32-bit constant address space"Matt Arsenault2018-02-0913-19/+86
| | | | | | This reverts r324494 and reapplies r324487. llvm-svn: 324747
* AMDGPU: Fix layering issueMatt Arsenault2018-02-096-21/+22
| | | | | | | Move utility function that depends on codegen. Fixes build with r324487 reapplied. llvm-svn: 324746
* [AMDGPU] More descriptive names in the memory legalizerStanislav Mekhanoshin2018-02-091-19/+20
| | | | | | | | NFC. Differential Revision: https://reviews.llvm.org/D43054 llvm-svn: 324712
* AMDGPU: Process SDWA block at a timeMatt Arsenault2018-02-081-32/+31
| | | | | | | | | | Right now this loops over the entire function every time there is a change, which is not very efficient. There's no practical reason to track this so globally, since the code motion optimization passes should be sinking instructions with single uses and the pass currently will not fold with multiple uses. llvm-svn: 324667
* AMDGPU: Minor cleanupsMatt Arsenault2018-02-081-3/+4
| | | | | | Column limit, typo, unnecessary reference llvm-svn: 324666
* AMDGPU: Fix incorrect reordering when inline asm defines LDS addressMatt Arsenault2018-02-081-3/+4
| | | | | | | Defs of operands outside of the instruction's explicit defs need to be checked. llvm-svn: 324554
* AMDGPU: Don't crash when trying to fold implicit operandsMatt Arsenault2018-02-081-0/+1
| | | | llvm-svn: 324550
* [AMDGPU] Fixed wait count reuseStanislav Mekhanoshin2018-02-081-50/+40
| | | | | | | | | | | | | The code reusing existing wait counts is incorrect since it keeps adding new operands to an old instruction instead of replacing the immediate. It was also effectively switched off by the condition that wait count is not an AMDGPU::S_WAITCNT. Also switched to BuildMI instead of creating instructions directly. Differential Revision: https://reviews.llvm.org/D42997 llvm-svn: 324547
* Revert "AMDGPU: Add 32-bit constant address space"Rafael Espindola2018-02-0712-86/+19
| | | | | | | | This reverts commit r324487. It broke clang tests. llvm-svn: 324494
* AMDGPU: Add 32-bit constant address spaceMarek Olsak2018-02-0712-19/+86
| | | | | | | | | | | | | | | | | | | | | | | Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
* AMDGPU: Remove the s_buffer workaround for GFX9 chipsMarek Olsak2018-02-072-11/+2
| | | | | | | | | | | | | | | | | | Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM. SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42756 llvm-svn: 324486
* AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legalTom Stellard2018-02-071-0/+3
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42152 llvm-svn: 324446
* [AMDGPU] Suppress redundant waitcnt instrs.Mark Searles2018-02-072-19/+39
| | | | | | | | | | | | 1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR. 2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing. 3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production. Differential Revision: https://reviews.llvm.org/D42854 llvm-svn: 324440
* AMDGPU: Select BFI patterns with 64-bit intsMatt Arsenault2018-02-073-6/+46
| | | | llvm-svn: 324431
* [AMDGPU] removed dead code handling rmw in memory legalizerStanislav Mekhanoshin2018-02-061-66/+3
| | | | | | | | | It was always using cmpxchg path and in rmw and cmpxchg instructions are not distinguishable in the BE. Differential Revision: https://reviews.llvm.org/D42976 llvm-svn: 324383
* AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALUMarek Olsak2018-02-061-2/+8
| | | | | | | | Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353
* [AMDGPU] do not generate .AMDGPU.config for amdpal os typeTim Renouf2018-02-061-16/+12
| | | | | | | | | | | | | | | Summary: Now we generate PAL metadata for the amdpal os type, there is no need to generate the .AMDGPU.config section. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37760 Change-Id: I303c5fad66656ce97293da60621afac6595b4c18 llvm-svn: 324346
* AMDGPU/MemoryModel: Fix monotonic atomic loadsKonstantin Zhuravlyov2018-02-061-1/+2
| | | | | | Those should have glc bit set for system and agent synchronization scopes llvm-svn: 324314
* [AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifierDmitry Preobrazhensky2018-02-054-15/+44
| | | | | | | | | See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154 Differential Revision: https://reviews.llvm.org/D42847 Reviewers: cfang, artem.tamazov, arsenm llvm-svn: 324237
* [AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodesDmitry Preobrazhensky2018-02-056-3/+67
| | | | | | | | | | | See bugs 36094, 36095: https://bugs.llvm.org/show_bug.cgi?id=36094 https://bugs.llvm.org/show_bug.cgi?id=36095 Differential Revision: https://reviews.llvm.org/D42692 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 324231
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-022-20/+3
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the ↵Changpeng Fang2018-02-012-4/+10
| | | | | | | | | | | | target has UnpackedD16VMem feature. Reviewers: Matt and Brian Differential Revision: https://reviews.llvm.org/D42548 llvm-svn: 323988
* AMDGPU: Fix missing SCC def from s_xor_b64_termMatt Arsenault2018-01-311-0/+1
| | | | llvm-svn: 323927
* AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9Marek Olsak2018-01-311-22/+31
| | | | | | | | | | | | | | | | Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 llvm-svn: 323909
* AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}Marek Olsak2018-01-315-8/+66
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908
* [AMDGPU] isRenamable fixes to support copy forwardingGeoff Berry2018-01-304-4/+12
| | | | | | | | | | | Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will be marked as not renamable, to avoid copy forwarding violating the constraint that only one operand may use the constant bus. These changes fix a few mis-compiles when copy forwarding is enabled in MachineCopyPropagation by D41835 (and were reviewed as part of that change). llvm-svn: 323794
* [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr ↵Mark Searles2018-01-301-63/+8
| | | | | | | | | | | | | count in debug output." Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\ teps/build_Lld/logs/stdio : /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable] static int32_t InstCnt = 0; " This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc. llvm-svn: 323791
* [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug ↵Mark Searles2018-01-301-8/+63
| | | | | | | | | | | | | | | output. -amdgpu-waitcnt-forcezero={1|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch). Differential Revision: https://reviews.llvm.org/D40091 llvm-svn: 323788
* AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace.Changpeng Fang2018-01-301-0/+5
| | | | | | | | | | Reviewer: Dmitry (dp). Differential Revision: https://reviews.llvm.org/D42596 llvm-svn: 323785
* AMDGPU: Move ADDRIndirect complex pattern into R600Instructions.tdTom Stellard2018-01-292-1/+1
| | | | | | | | | | | | | | Summary: This is only used by R600. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37114 llvm-svn: 323709
* AMDGPU: Allow a SGPR for the conditional KILL operandMarek Olsak2018-01-291-23/+31
| | | | | | | | | | | | | | | | | | Patch by: Bas Nieuwenhuizen Just use the _e64 variant if needed. This should be possible as per def : Pat < (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))), (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond)) > ; I don't think we can get an immediate for the other operand for which we need the second 32-bit word. https://reviews.llvm.org/D42302 llvm-svn: 323706
* [AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regsGeoff Berry2018-01-291-1/+3
| | | | | | | | | Summary: Fix a few places that were modifying code after register allocation to set the renamable bit correctly to avoid failing the validation added in D42449. llvm-svn: 323675
* [globalisel] Make LegalizerInfo::LegalizeAction available outside of ↵Daniel Sanders2018-01-291-0/+1
| | | | | | | | | | | | LegalizerInfo. NFC Summary: The improvements to the LegalizerInfo discussed in D42244 require that LegalizerInfo::LegalizeAction be available for use in other classes. As such, it needs to be moved out of LegalizerInfo. This has been done separately to the next patch to minimize the noise in that patch. llvm-svn: 323669
* [AMDGPU][MC] Corrected parsing of image opcode modifiers r128 and d16Dmitry Preobrazhensky2018-01-292-2/+14
| | | | | | | | | | | See bugs 36092, 36093: https://bugs.llvm.org/show_bug.cgi?id=36092 https://bugs.llvm.org/show_bug.cgi?id=36093 Differential Revision: https://reviews.llvm.org/D42583 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323651
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-295-5/+5
| | | | | | "to to" -> "to" llvm-svn: 323628
* [AMDGPU][MC] Added validation of image dst/data size (must match dmask and tfe)Dmitry Preobrazhensky2018-01-261-0/+61
| | | | | | | | | See bug 36000: https://bugs.llvm.org/show_bug.cgi?id=36000 Differential Revision: https://reviews.llvm.org/D42483 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323538
* [AMDGPU][MC] Added support of 64-bit image atomicsDmitry Preobrazhensky2018-01-265-17/+115
| | | | | | | | | See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998 Differential Revision: https://reviews.llvm.org/D42469 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323534
* [AMDGPU][MC] Enabled disassembler for image atomic operationsDmitry Preobrazhensky2018-01-261-12/+16
| | | | | | | | | See bug 35988: https://bugs.llvm.org/show_bug.cgi?id=35988 Differential Revision: https://reviews.llvm.org/D42186 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323527
* [AMDGPU] fix LDS f32 intrinsicsDaniil Fukalov2018-01-262-16/+19
| | | | | | | | | | | | - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516
* [AMDGPU] Make sure all super regs of reserved regs are marked reserved.Geoff Berry2018-01-247-29/+30
| | | | | | | | | | | | | | | | | | | | | Summary: Move reserveRegisterTuples into AMDGPURegisterInfo and use it in R600RegisterInfo::getReservedRegs and R600InstrInfo::reserveIndirectRegisters to ensure that all super registers of reserved registers are also marked as reserved. Before this change, under certain circumstances, the registers %t1_x and %t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not be, leading to the register allocator sometimes assigning a register to %t1_xy, which is invalid since %t1_x is reserved. Reviewers: arsenm, tstellar, MatzeB, qcolombet Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D42448 llvm-svn: 323356
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-01-241-1/+1
| | | | | | "the the" -> "the" llvm-svn: 323302
OpenPOWER on IntegriCloud