| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D28980
llvm-svn: 293171
|
| |
|
|
| |
llvm-svn: 293127
|
| |
|
|
|
|
|
| |
According to the documentation this is supposed to be -1
if indirect calls are not supported.
llvm-svn: 293081
|
| |
|
|
| |
llvm-svn: 293028
|
| |
|
|
|
|
|
|
| |
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.
llvm-svn: 293024
|
| |
|
|
| |
llvm-svn: 293019
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Leave early ifcvt disabled for now since there are some
shader-db regressions.
This causes some immediate improvements, but could be better.
The cost checking that the pass does is based on critical path
length for out of order CPUs which we do not want so it skips out
on many cases we want.
llvm-svn: 293016
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sequence like this:
v_cmpx_le_f32_e32 vcc, 0, v0
s_branch BB0_30
s_cbranch_execnz BB0_30
; BB#29:
exp null off, off, off, off done vm
s_endpgm
BB0_30:
; %endif110
is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.
The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.
Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.
This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 292985
|
| |
|
|
|
|
|
|
|
|
|
| |
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
|
| |
|
|
|
|
|
|
|
|
| |
Differential Revision:
http://reviews.llvm.org/D28970
Reviewer:
Matt
llvm-svn: 292966
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regalloc creates COPY instructions which do not formally use VALU.
That results in v_mov instructions displaced after exec mask modification.
One pass which do it is SIOptimizeExecMasking, but potentially it can be
done by other passes too.
This patch adds a pass immediately after regalloc to add implicit exec
use operand to all VGPR copy instructions.
Differential Revision: https://reviews.llvm.org/D28874
llvm-svn: 292956
|
| |
|
|
| |
llvm-svn: 292893
|
| |
|
|
|
|
| |
This avoids stack usage.
llvm-svn: 292846
|
| |
|
|
|
|
|
| |
Fixes turning a 32-bit scalar load into an extending vector load
for AMDGPU when dynamically indexing a vector.
llvm-svn: 292842
|
| |
|
|
|
|
|
| |
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.
llvm-svn: 292837
|
| |
|
|
| |
llvm-svn: 292814
|
| |
|
|
|
|
| |
The colon is important.
llvm-svn: 292761
|
| |
|
|
|
|
|
|
|
|
|
| |
Add DUMMY_CHAIN SDNode to denote stores of interest
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411
Differential Revision: https://reviews.llvm.org/D27964
llvm-svn: 292651
|
| |
|
|
| |
llvm-svn: 292567
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.
Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.
Differential Revision: https://reviews.llvm.org/D27997
llvm-svn: 292554
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.
fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.
llvm-svn: 292473
|
| |
|
|
|
|
|
|
| |
They seem to produce nonsense results when used.
This should be applied to the release branch.
llvm-svn: 292472
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.
With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.
Differential Revision: https://reviews.llvm.org/D28782
llvm-svn: 292413
|
| |
|
|
| |
llvm-svn: 292328
|
| |
|
|
| |
llvm-svn: 292205
|
| |
|
|
|
|
|
|
| |
_RTN versions will be a lot more complicated
Differential Revision: https://reviews.llvm.org/D28067
llvm-svn: 292162
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28496
llvm-svn: 291954
|
| |
|
|
| |
llvm-svn: 291792
|
| |
|
|
| |
llvm-svn: 291790
|
| |
|
|
| |
llvm-svn: 291784
|
| |
|
|
| |
llvm-svn: 291779
|
| |
|
|
| |
llvm-svn: 291778
|
| |
|
|
| |
llvm-svn: 291777
|
| |
|
|
|
|
| |
Patch mostly by Fiona Glaser
llvm-svn: 291733
|
| |
|
|
|
|
| |
Patch mostly by Fiona Glaser
llvm-svn: 291732
|
| |
|
|
|
|
| |
Patch mostly by Fiona Glaser
llvm-svn: 291731
|
| |
|
|
|
|
| |
Allows better source modifier usage.
llvm-svn: 291729
|
| |
|
|
|
|
| |
To shrink to VOP2 the input carry must also be VCC.
llvm-svn: 291720
|
| |
|
|
|
|
|
|
| |
This produces worse code when i16 is legal, mostly
due to combines getting confused by conversions inserted
for uniform 16-bit operations.
llvm-svn: 291717
|
| |
|
|
|
|
|
| |
This was shrinking the instruction even though the carry output
register was a virtual register, not known VCC.
llvm-svn: 291716
|
| |
|
|
|
|
|
| |
Whether it is legal or not needs to check for the instruction
it will be replaced with.
llvm-svn: 291711
|
| |
|
|
|
|
|
|
|
| |
This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.
This needs a simple probability check because there are some cases where it is
not profitable.
llvm-svn: 291695
|
| |
|
|
|
|
|
|
| |
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.
llvm-svn: 291642
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28164
llvm-svn: 291622
|
| |
|
|
|
|
| |
In future commits these patterns will appear after moveToVALU changes.
llvm-svn: 291615
|
| |
|
|
|
|
|
|
|
|
|
| |
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.
Differential revision: https://reviews.llvm.org/D27742
llvm-svn: 291609
|
| |
|
|
|
|
|
|
|
| |
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.
llvm-svn: 291604
|
| |
|
|
|
|
| |
This was enabled without many specific tests or the comment.
llvm-svn: 291586
|
| |
|
|
|
|
|
|
|
|
|
| |
For i16 zeroext arguments when i16 was a legal type, the
known bits information from the truncate was lost. Insert
a zeroext so the known bits optimizations work with the 32-bit
loads.
Fixes code quality regressions vs. SI in min.ll test.
llvm-svn: 291461
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Originally
i64 = umax t8, Constant:i64<4>
was expanded into
i32,i32 = umax Constant:i32<0>, Constant:i32<0>
i32,i32 = umax t7, Constant:i32<4>
Now instead the two produced umax:es return i32 instead of i32, i32.
Thanks to Jan Vesely for help with the test case.
Patch by mikael.holmen at ericsson.com
Reviewers: bogner, jvesely, tstellarAMD, arsenm
Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D28135
llvm-svn: 291441
|