| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec.
SI_IF_BREAK and SI_ELSE_BREAK do not read it either.
llvm-svn: 279909
|
|
|
|
| |
llvm-svn: 279902
|
|
|
|
|
|
|
|
| |
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.
llvm-svn: 279901
|
|
|
|
| |
llvm-svn: 279900
|
|
|
|
|
|
|
|
|
| |
It isn't used for anything, and is also misleading since
it could be spilled at the end of the block, so it can't be relied
on. There ends up being a verifier error about using an undefined
register since the spill kills the register.
llvm-svn: 279899
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.
Reviewed by:
arsenm and tstellarAMD
Differential Revision:
http://reviews.llvm.org/D22489
llvm-svn: 279660
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D23069
llvm-svn: 279629
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.
One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.
This has a positive effect on SGPR usage in shader-db.
llvm-svn: 279464
|
|
|
|
|
|
|
|
|
|
| |
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D23689
llvm-svn: 279126
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D23666
llvm-svn: 279106
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D23336
llvm-svn: 278403
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D23133
llvm-svn: 278354
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch define and implement amdgcn image intrinsics with sampler.
1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
to have three suffixes to overload each of these three types;
2. D128 as well as two other flags are implied in the three types, for example,
if you use v8i32 as resource type, then r128 is 0!
3. don't expose TFE flag, and other flags are exposed in the instruction order:
unrm, glc, slc, lwe and da.
Differential Revision: http://reviews.llvm.org/D22838
Reviewed by:
arsenm and tstellarAMD
llvm-svn: 278291
|
|
|
|
| |
llvm-svn: 278278
|
|
|
|
| |
llvm-svn: 278276
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.
The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.
llvm-svn: 278273
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.
For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.
This is a candidate for the 3.9 release branch.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D22675
llvm-svn: 277504
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D22522
llvm-svn: 277344
|
|
|
|
| |
llvm-svn: 277259
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D22482
llvm-svn: 276998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic block)
The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction
s_and_b64 exec, exec, s[...]
which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:
s_or_savexec_b64 dst, src
s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow
s_xor_b64 exec, exec, dst
Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.
Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit
s_or_saveexec_b64 dst, src
...
s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst
and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions
s_and_b64 vcc, exec, vcc
before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.
In any case, this current patch could help in the short term with the whole
ExecModified business.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22846
llvm-svn: 276972
|
|
|
|
| |
llvm-svn: 276819
|
|
|
|
|
|
|
| |
This could use some additional optimization work
to use mad/mac legacy.
llvm-svn: 276764
|
|
|
|
|
|
|
| |
Remove dead code from r600 intrinsic removal.
Remove unset members, rename StackSize to be less ambiguous.
llvm-svn: 276436
|
|
|
|
|
|
|
| |
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.
llvm-svn: 276435
|
|
|
|
|
|
|
|
|
|
|
| |
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.
These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.
llvm-svn: 275988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.
llvm-svn: 275934
|
|
|
|
| |
llvm-svn: 275871
|
|
|
|
| |
llvm-svn: 275870
|
|
|
|
|
|
| |
I meant to squash this into it.
llvm-svn: 275220
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D22239
llvm-svn: 275197
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22217
llvm-svn: 275160
|
|
|
|
| |
llvm-svn: 275133
|
|
|
|
|
|
| |
These are all expanded to instructions that include an scc def.
llvm-svn: 275132
|
|
|
|
| |
llvm-svn: 274978
|
|
|
|
| |
llvm-svn: 274954
|
|
|
|
| |
llvm-svn: 274939
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D22049
llvm-svn: 274852
|
|
|
|
| |
llvm-svn: 274818
|
|
|
|
|
|
| |
Differential revision: http://reviews.llvm.org/D22041
llvm-svn: 274756
|
|
|
|
|
|
| |
2 source registers. NFC.
llvm-svn: 274556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The isGlobalLoad() query was returning true for constant address space loads
with memory types less than 32-bits, which is wrong. This logic has been
replaced with PatFrag in the TableGen files, to provide the same functionality.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21696
llvm-svn: 274521
|
|
|
|
|
|
|
|
|
| |
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.
llvm-svn: 273652
|
|
|
|
| |
llvm-svn: 273525
|
|
|
|
| |
llvm-svn: 273514
|
|
|
|
|
|
|
|
| |
Reviewers: tstellarAMD, arsenm
Differential Revision: http://reviews.llvm.org/D21533
llvm-svn: 273496
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.
Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.
llvm-svn: 273467
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.
Emit a warning that this will only terminate the wave and
not really trap.
llvm-svn: 273062
|
|
|
|
|
|
| |
Mesa doesn't emit this for llvm >= 3.8 anymore.
llvm-svn: 273050
|