| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
same address to avoid WAR conflict.
Reviewers: rampitec, vpykhtin, nhaehnle
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D71934
|
|
|
|
|
|
|
|
|
|
| |
Summary: - Other counters are accidentally cleared.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71866
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The waitcnt pass can overflow the counters when the number of outstanding events
for a type exceed the capacity of the counter. This can lead to inefficient
insertion of waitcnts, or to waitcnt instructions with max values for each type.
The last situation can cause an instruction which when disassembled appears to
be an illegal waitcnt without an operand.
In these cases we should add a wait for the 'counter maximum' - 1, and update the
waitcnt brackets accordingly.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70418
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
VCCZBugHandledSet was used to make sure we don't apply the same
workaround more than once to a single cbranch instruction, but it's not
necessary because the workaround involves inserting an s_waitcnt
instruction, which is enough for subsequent iterations to detect that no
further workaround is necessary.
Also beef up the test case to check that the workaround was only applied
once. I have also manually verified that the test still passes even if I
hack the big do-while loop in runOnMachineFunction to run a minimum of
five iterations.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69621
|
|
|
|
|
| |
This one should have been done in r363902 when hasReadVCCZBug was
introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An outstanding load with same destination sgpr as call could cause PC to be
updated with junk value on return.
Reviewers: arsenm, rampitec
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69474
|
|
|
|
|
|
|
|
| |
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64584
llvm-svn: 365824
|
|
|
|
|
|
| |
These aren't produced now, but will be in a future patch.
llvm-svn: 364983
|
|
|
|
| |
llvm-svn: 363983
|
|
|
|
|
|
|
|
| |
This is incomplete, and ideally these would all be removed, but it's
better to localize them to the subtarget first with comments about
what they're for.
llvm-svn: 363902
|
|
|
|
|
|
|
|
| |
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.
llvm-svn: 363870
|
|
|
|
|
|
|
|
|
| |
There may or may not be additional work to handle this correctly on
SI/CI.
........
Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/
llvm-svn: 363797
|
|
|
|
|
|
|
| |
There may or may not be additional work to handle this correctly on
SI/CI.
llvm-svn: 363678
|
|
|
|
|
|
|
|
|
| |
This is cpp source part of wave32 support, excluding overriden
getRegClass().
Differential Revision: https://reviews.llvm.org/D63351
llvm-svn: 363513
|
|
|
|
|
|
|
|
|
|
|
| |
Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.
Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.
llvm-svn: 363465
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61534
llvm-svn: 359938
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61094
llvm-svn: 359224
|
|
|
|
| |
llvm-svn: 356206
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52944
llvm-svn: 351351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.
Irreducible loop test has been extended based on a CTS failure detected for GFX9.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D55602
llvm-svn: 349611
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.
Use a simple fixpoint iteration instead.
In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.
Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}
Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54231
llvm-svn: 347853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There is one obsolete reference to using -1 as an indication of "unknown",
but this isn't actually used anywhere.
Using unsigned makes robust wrapping checks easier.
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam
Differential Revision: https://reviews.llvm.org/D54230
llvm-svn: 347852
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54229
llvm-svn: 347851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of storing the "score" (last time point) of the various relevant
events, only store whether an event is pending or not.
This is sufficient, because whenever only one event of a count type is
pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).
This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK.
I do not understand what this special handling ever attempted to achieve.
It has existed ever since the original port from an internal code base,
so my best guess is that it solved a problem related to EXEC handling in
that internal code base.
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54228
llvm-svn: 347850
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It hides the type casting ugliness, and I happened to have to add a new
such loop (in a later patch).
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54227
llvm-svn: 347849
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reduce the statefulness of the algorithm in two ways:
1. More clearly split generateWaitcntInstBefore into two phases: the
first one which determines the required wait, if any, without changing
the ScoreBrackets, and the second one which actually inserts the wait
and updates the brackets.
2. Communicate pre-existing s_waitcnt instructions using an argument to
generateWaitcntInstBefore instead of through the ScoreBrackets.
To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.
There are some functional changes:
1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
SMEM instructions as required instead of waiting on all counters.
2. We now properly track pre-existing waitcnt's in all cases, which leads
to less conservative waitcnts being emitted in some cases.
s_load_dword ...
s_waitcnt lgkmcnt(0) <-- pre-existing wait count
ds_read_b32 v0, ...
ds_read_b32 v1, ...
s_waitcnt lgkmcnt(0) <-- this is too conservative
use(v0)
more code
use(v1)
This increases code size a bit, but the reduced latency should still be a
win in basically all cases. The worst code size regressions in my shader-db
are:
WORST REGRESSIONS - Code Size
Before After Delta Percentage
1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0]
2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0]
4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0]
2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0]
3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0]
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54226
llvm-svn: 347848
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54225
llvm-svn: 347192
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Remove redundant logic and simplify control flow.
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54086
llvm-svn: 346363
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is not needed, because we don't actually insert relevant branches
for KILLs that late in the compilation flow.
Besides, this was always checking for the wrong kill opcode anyway...
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54085
llvm-svn: 346362
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
|
|
|
|
|
|
|
|
|
|
|
| |
TargetParser."
This reverts commit r341982.
The change introduced a layering violation. Reverting to unbreak
our integrate.
llvm-svn: 342023
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
|
|
|
|
| |
llvm-svn: 341165
|
|
|
|
|
|
|
|
|
|
| |
build error"
Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error""
( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via
2b2ee080f0164485562593b1b87291a48cea4a9a .
llvm-svn: 337156
|
|
|
|
|
|
|
|
| |
build error"
This reverts commit fe0a456510131f268e388c4a18a92f575c0db183.
llvm-svn: 337153
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>
Fixes the following building error:
external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.
Differential Revision: https://reviews.llvm.org/D49089
llvm-svn: 336588
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc. This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself. This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.
Reviewers: arsenm, nhaehnle, jvesely
Reviewed By: arsenm
Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46365
llvm-svn: 335942
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48607
llvm-svn: 335669
|
|
|
|
|
|
|
|
|
|
| |
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.
Differential Revision: https://reviews.llvm.org/D48391
llvm-svn: 335268
|
|
|
|
|
|
|
|
| |
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.
Differential Revision: https://reviews.llvm.org/D46616
llvm-svn: 333926
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://reviews.llvm.org/rL333556 caused a buildbot failure.
See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio
/Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(),
The unused variable was for debugging purposes; removing that piece of code
to fix the build.
llvm-svn: 333559
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.
Differential Revision: https://reviews.llvm.org/D47488
llvm-svn: 333556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.
This patch has no new test case. I have run regression test and there is
no difference in regression test.
Differential Revision: https://reviews.llvm.org/D45342
Patch by Hsiangkai Wang.
llvm-svn: 331844
|
|
|
|
|
|
|
|
|
| |
Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty
Differential Revision: https://reviews.llvm.org/D46448
llvm-svn: 331641
|