| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
- Add MACH flags
- Add XNACK flag
- Add reserved flags
- Minor cleanups in docs
Differential Revision: https://reviews.llvm.org/D43356
llvm-svn: 325399
|
|
|
|
|
|
|
|
|
|
| |
- Remove gfx800
- Make iceland gfx802
- Add xnack to gfx902
Differential Revision: https://reviews.llvm.org/D43355
llvm-svn: 325393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce
an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.
Reviewers:
arsenm
Differential Revision:
https://reviews.llvm.org/D33559
llvm-svn: 325372
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction.
Summary:
In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction
is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect
vgpr.
In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction.
Reviewers:
rampitec
Differential Revision:
https://reviews.llvm.org/D43297
llvm-svn: 325355
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D43350
llvm-svn: 325299
|
|
|
|
|
|
|
|
|
| |
Kernel arguments likely read by all workitems and should not bypass
cache. Fixes performance hit in sub-dword argument loads.
Differential Revision: https://reviews.llvm.org/D43249
llvm-svn: 325146
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D43170
llvm-svn: 325030
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AMDGPUPromoteAlloca pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324774
|
|
|
|
| |
llvm-svn: 324751
|
|
|
|
|
|
| |
This reverts r324494 and reapplies r324487.
llvm-svn: 324747
|
|
|
|
|
|
|
| |
Move utility function that depends on codegen.
Fixes build with r324487 reapplied.
llvm-svn: 324746
|
|
|
|
|
|
|
|
| |
NFC.
Differential Revision: https://reviews.llvm.org/D43054
llvm-svn: 324712
|
|
|
|
|
|
|
|
|
|
| |
Right now this loops over the entire function every time there
is a change, which is not very efficient. There's no practical
reason to track this so globally, since the code motion optimization
passes should be sinking instructions with single uses and
the pass currently will not fold with multiple uses.
llvm-svn: 324667
|
|
|
|
|
|
| |
Column limit, typo, unnecessary reference
llvm-svn: 324666
|
|
|
|
|
|
|
| |
Defs of operands outside of the instruction's explicit defs need
to be checked.
llvm-svn: 324554
|
|
|
|
| |
llvm-svn: 324550
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code reusing existing wait counts is incorrect since it keeps
adding new operands to an old instruction instead of replacing
the immediate. It was also effectively switched off by the condition
that wait count is not an AMDGPU::S_WAITCNT.
Also switched to BuildMI instead of creating instructions directly.
Differential Revision: https://reviews.llvm.org/D42997
llvm-svn: 324547
|
|
|
|
|
|
|
|
| |
This reverts commit r324487.
It broke clang tests.
llvm-svn: 324494
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note: This is a candidate for LLVM 6.0, because it was planned to be
in that release but was delayed due to a long review period.
Merge conflict in release_60 - resolution:
Add "-p6:32:32" into the second (non-amdgiz) string.
Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.
Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D41651
llvm-svn: 324487
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.
SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42756
llvm-svn: 324486
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42152
llvm-svn: 324446
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR.
2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing.
3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production.
Differential Revision: https://reviews.llvm.org/D42854
llvm-svn: 324440
|
|
|
|
| |
llvm-svn: 324431
|
|
|
|
|
|
|
|
|
| |
It was always using cmpxchg path and in rmw and cmpxchg instructions
are not distinguishable in the BE.
Differential Revision: https://reviews.llvm.org/D42976
llvm-svn: 324383
|
|
|
|
|
|
|
|
| |
Author: Bas Nieuwenhuizen
https://reviews.llvm.org/D42881
llvm-svn: 324353
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Now we generate PAL metadata for the amdpal os type, there is no need to
generate the .AMDGPU.config section.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37760
Change-Id: I303c5fad66656ce97293da60621afac6595b4c18
llvm-svn: 324346
|
|
|
|
|
|
| |
Those should have glc bit set for system and agent synchronization scopes
llvm-svn: 324314
|
|
|
|
|
|
|
|
|
| |
See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154
Differential Revision: https://reviews.llvm.org/D42847
Reviewers: cfang, artem.tamazov, arsenm
llvm-svn: 324237
|
|
|
|
|
|
|
|
|
|
|
| |
See bugs 36094, 36095:
https://bugs.llvm.org/show_bug.cgi?id=36094
https://bugs.llvm.org/show_bug.cgi?id=36095
Differential Revision: https://reviews.llvm.org/D42692
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 324231
|
|
|
|
|
|
|
|
| |
This requires corresponding clang change.
Differential Revision: https://reviews.llvm.org/D40955
llvm-svn: 324101
|
|
|
|
|
|
|
|
|
|
|
|
| |
target has UnpackedD16VMem feature.
Reviewers:
Matt and Brian
Differential Revision:
https://reviews.llvm.org/D42548
llvm-svn: 323988
|
|
|
|
| |
llvm-svn: 323927
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This enables load merging into x2, x4, which is driven by inline offsets.
6500 shaders are affected:
Code Size in affected shaders: -15.14 %
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D42078
llvm-svn: 323909
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D41663
llvm-svn: 323908
|
|
|
|
|
|
|
|
|
|
|
| |
Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will
be marked as not renamable, to avoid copy forwarding violating the
constraint that only one operand may use the constant bus.
These changes fix a few mis-compiles when copy forwarding is enabled in
MachineCopyPropagation by D41835 (and were reviewed as part of that change).
llvm-svn: 323794
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
count in debug output."
Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\
teps/build_Lld/logs/stdio :
/Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable]
static int32_t InstCnt = 0;
"
This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc.
llvm-svn: 323791
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
output.
-amdgpu-waitcnt-forcezero={1|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs
This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).
Differential Revision: https://reviews.llvm.org/D40091
llvm-svn: 323788
|
|
|
|
|
|
|
|
|
|
| |
Reviewer:
Dmitry (dp).
Differential Revision:
https://reviews.llvm.org/D42596
llvm-svn: 323785
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is only used by R600.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37114
llvm-svn: 323709
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch by: Bas Nieuwenhuizen
Just use the _e64 variant if needed. This should be possible as per
def : Pat <
(int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
(SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
> ;
I don't think we can get an immediate for the other operand for which we
need the second 32-bit word.
https://reviews.llvm.org/D42302
llvm-svn: 323706
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix a few places that were modifying code after register
allocation to set the renamable bit correctly to avoid failing the
validation added in D42449.
llvm-svn: 323675
|
|
|
|
|
|
|
|
|
|
|
|
| |
LegalizerInfo. NFC
Summary:
The improvements to the LegalizerInfo discussed in D42244 require that
LegalizerInfo::LegalizeAction be available for use in other classes. As such,
it needs to be moved out of LegalizerInfo. This has been done separately to the
next patch to minimize the noise in that patch.
llvm-svn: 323669
|
|
|
|
|
|
|
|
|
|
|
| |
See bugs 36092, 36093:
https://bugs.llvm.org/show_bug.cgi?id=36092
https://bugs.llvm.org/show_bug.cgi?id=36093
Differential Revision: https://reviews.llvm.org/D42583
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323651
|
|
|
|
|
|
| |
"to to" -> "to"
llvm-svn: 323628
|
|
|
|
|
|
|
|
|
| |
See bug 36000: https://bugs.llvm.org/show_bug.cgi?id=36000
Differential Revision: https://reviews.llvm.org/D42483
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323538
|
|
|
|
|
|
|
|
|
| |
See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998
Differential Revision: https://reviews.llvm.org/D42469
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323534
|
|
|
|
|
|
|
|
|
| |
See bug 35988: https://bugs.llvm.org/show_bug.cgi?id=35988
Differential Revision: https://reviews.llvm.org/D42186
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323527
|
|
|
|
|
|
|
|
|
|
|
|
| |
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic
Reviewed by: b-sumner
Differential Revision: https://reviews.llvm.org/D42383
llvm-svn: 323516
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Move reserveRegisterTuples into AMDGPURegisterInfo and use it in
R600RegisterInfo::getReservedRegs and
R600InstrInfo::reserveIndirectRegisters to ensure that all super
registers of reserved registers are also marked as reserved.
Before this change, under certain circumstances, the registers %t1_x and
%t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not
be, leading to the register allocator sometimes assigning a register to
%t1_xy, which is invalid since %t1_x is reserved.
Reviewers: arsenm, tstellar, MatzeB, qcolombet
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D42448
llvm-svn: 323356
|
|
|
|
|
|
| |
"the the" -> "the"
llvm-svn: 323302
|