| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
Start migrating to a form that will be compatible with the global isel
emitter. Also should fix some overly lax checks on the memory type,
which allowed mis-selecting some illegal atomics.
llvm-svn: 367506
|
|
|
|
|
|
|
| |
These need to use an fadd, not an add. Also make the noret part clear
in the name.
llvm-svn: 367505
|
|
|
|
| |
llvm-svn: 367504
|
|
|
|
|
|
|
| |
AMDGPU change and test is a placeholder until a future patch with
complete handling.
llvm-svn: 367503
|
|
|
|
|
|
| |
This reverts commit r359363, reapplying r357634
llvm-svn: 367500
|
|
|
|
| |
llvm-svn: 367498
|
|
|
|
|
|
|
|
|
|
|
| |
We had couple places which still return 10 as a maximum
occupancy. Fixed.
Also print comment about occupancy as compiler see it.
Differential Revision: https://reviews.llvm.org/D65423
llvm-svn: 367381
|
|
|
|
|
|
|
|
|
|
| |
AMDGPU uses some custom code predicates for testing alignments.
I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.
llvm-svn: 367373
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65476
llvm-svn: 367355
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65471
llvm-svn: 367347
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64966
llvm-svn: 367344
|
|
|
|
|
|
|
|
|
|
| |
Empty condition strings are considerde always true. This removes a lot
of clutter from the generated matcher tables.
This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to
6.1M.
llvm-svn: 367326
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The LoadStoreOptimizer was creating instructions with 2
MachineMemOperands, which meant they were assumed to alias with all other instructions,
because MachineInstr:mayAlias() returns true when an instruction has multiple
MachineMemOperands.
This was preventing these instructions from being merged again, and was
giving the scheduler less freedom to reorder them.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65036
llvm-svn: 367237
|
|
|
|
| |
llvm-svn: 367235
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.
This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.
This might allow D63953 or other similar workarounds to be removed.
Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue
Reviewed By: nhaehnle
Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65141
llvm-svn: 367218
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)
Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65325
Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee
llvm-svn: 367206
|
|
|
|
| |
llvm-svn: 367131
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65328
llvm-svn: 367105
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm
only if there is other WQM computation in the shader.
Reviewers: nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64935
llvm-svn: 367097
|
|
|
|
|
|
|
|
|
| |
handleAssignments gives up pretty easily on structs, and i8 values for
some reason. The other case that doesn't work is when an implicit sret
needs to be inserted if the return size exceeds the number of return
registers.
llvm-svn: 367082
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- As LCSSA is turned on just before isel, it may create PHI of the flow,
which is consumed by pseudo structurized CFG instructions. When that
PHIs are eliminated in O0, COPY may be placed wrongly as the these
pseudo structurized CFG instructions are considering prologue of MBB.
- Run extra `unreachable-mbb-elimination` at the end of isel to clean up
PHIs.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64353
llvm-svn: 367023
|
|
|
|
| |
llvm-svn: 367018
|
|
|
|
|
|
|
|
|
|
| |
To support prefetch mode 3 we need to pad current
cacheline and fill 3 cachelines after. Current padding
is only sufficient for mode 2.
Differential Revision: https://reviews.llvm.org/D65236
llvm-svn: 366938
|
|
|
|
|
|
|
|
|
|
| |
by the code
Reviewers: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D65216
llvm-svn: 366921
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65158
llvm-svn: 366917
|
|
|
|
|
|
|
| |
The G_ANYEXT handling can end up reaching selectCOPY, which mutates
the instruction in place.
llvm-svn: 366915
|
|
|
|
| |
llvm-svn: 366808
|
|
|
|
|
|
|
|
|
|
|
| |
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.
This allows global isel to import the simple cases once the complex
pattern is implemented.
llvm-svn: 366743
|
|
|
|
|
|
|
| |
The minnum/maxnum case are dead, and the cvt is handled by the
default.
llvm-svn: 366685
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.
Reviewers: arsenm, dstuttard, tpr
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64981
llvm-svn: 366667
|
|
|
|
| |
llvm-svn: 366621
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65007
llvm-svn: 366619
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65010
llvm-svn: 366616
|
|
|
|
| |
llvm-svn: 366613
|
|
|
|
|
|
| |
Avoid using custom code predicates.
llvm-svn: 366609
|
|
|
|
|
|
|
| |
This only works if the high bits of m0 are also 0, so m0 would have to
be set to 0xffff.
llvm-svn: 366608
|
|
|
|
|
|
|
| |
This is apparently required to be the immediately following
instruction, so force it into a bundle with a waitcnt.
llvm-svn: 366607
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.
Differential Revision: https://reviews.llvm.org/D64967
llvm-svn: 366598
|
|
|
|
|
|
| |
The DAG lowering sets dereferencable and invariant, not nontemporal.
llvm-svn: 366597
|
|
|
|
|
|
|
| |
v2f16 case doesn't work yet because the VOP3P complex patterns haven't
been ported yet.
llvm-svn: 366585
|
|
|
|
|
|
| |
Handles structs used directly in argument lists.
llvm-svn: 366584
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should now handle everything except structs passed as multiple
registers.
I think most of the packing logic should be handled by
handleAssignments, but I'm unclear on what the contract is for
multiple registers. This is copying how x86 handles this.
This does change the behavior of the test_sgpr_alignment0 amdgpu_vs
test. I don't think shader arguments should try to follow the
alignment, and registers need to be repacked. I also don't think it
matters, since I think the pointers are packed to the beginning of the
argument list anyway.
llvm-svn: 366582
|
|
|
|
|
|
|
|
|
|
| |
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.
This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.
llvm-svn: 366578
|
|
|
|
|
|
|
|
|
|
| |
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D64629
llvm-svn: 366571
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:
1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
combine into an ADD3 instruction.
Because of #2 and #3 the end result is improved from this:
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v1, v4, v5, v1
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
To this:
v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
I.e. two fewer computational instructions, one extra nop where we could
schedule something else.
Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64411
llvm-svn: 366543
|
|
|
|
|
|
|
|
| |
This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb.
Differential Revision: https://reviews.llvm.org/D64952
llvm-svn: 366505
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64892
llvm-svn: 366385
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64885
llvm-svn: 366376
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348
|
|
|
|
|
|
| |
Avoids creating an extra intermediate mov.
llvm-svn: 366340
|