| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
To support prefetch mode 3 we need to pad current
cacheline and fill 3 cachelines after. Current padding
is only sufficient for mode 2.
Differential Revision: https://reviews.llvm.org/D65236
llvm-svn: 366938
|
| |
|
|
|
|
|
|
|
|
| |
by the code
Reviewers: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D65216
llvm-svn: 366921
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65158
llvm-svn: 366917
|
| |
|
|
|
|
|
| |
The G_ANYEXT handling can end up reaching selectCOPY, which mutates
the instruction in place.
llvm-svn: 366915
|
| |
|
|
| |
llvm-svn: 366808
|
| |
|
|
|
|
|
|
|
|
|
| |
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.
This allows global isel to import the simple cases once the complex
pattern is implemented.
llvm-svn: 366743
|
| |
|
|
|
|
|
| |
The minnum/maxnum case are dead, and the cvt is handled by the
default.
llvm-svn: 366685
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.
Reviewers: arsenm, dstuttard, tpr
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64981
llvm-svn: 366667
|
| |
|
|
| |
llvm-svn: 366621
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65007
llvm-svn: 366619
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65010
llvm-svn: 366616
|
| |
|
|
| |
llvm-svn: 366613
|
| |
|
|
|
|
| |
Avoid using custom code predicates.
llvm-svn: 366609
|
| |
|
|
|
|
|
| |
This only works if the high bits of m0 are also 0, so m0 would have to
be set to 0xffff.
llvm-svn: 366608
|
| |
|
|
|
|
|
| |
This is apparently required to be the immediately following
instruction, so force it into a bundle with a waitcnt.
llvm-svn: 366607
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.
Differential Revision: https://reviews.llvm.org/D64967
llvm-svn: 366598
|
| |
|
|
|
|
| |
The DAG lowering sets dereferencable and invariant, not nontemporal.
llvm-svn: 366597
|
| |
|
|
|
|
|
| |
v2f16 case doesn't work yet because the VOP3P complex patterns haven't
been ported yet.
llvm-svn: 366585
|
| |
|
|
|
|
| |
Handles structs used directly in argument lists.
llvm-svn: 366584
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should now handle everything except structs passed as multiple
registers.
I think most of the packing logic should be handled by
handleAssignments, but I'm unclear on what the contract is for
multiple registers. This is copying how x86 handles this.
This does change the behavior of the test_sgpr_alignment0 amdgpu_vs
test. I don't think shader arguments should try to follow the
alignment, and registers need to be repacked. I also don't think it
matters, since I think the pointers are packed to the beginning of the
argument list anyway.
llvm-svn: 366582
|
| |
|
|
|
|
|
|
|
|
| |
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.
This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.
llvm-svn: 366578
|
| |
|
|
|
|
|
|
|
|
| |
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D64629
llvm-svn: 366571
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:
1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
combine into an ADD3 instruction.
Because of #2 and #3 the end result is improved from this:
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v1, v4, v5, v1
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
To this:
v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
I.e. two fewer computational instructions, one extra nop where we could
schedule something else.
Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64411
llvm-svn: 366543
|
| |
|
|
|
|
|
|
| |
This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb.
Differential Revision: https://reviews.llvm.org/D64952
llvm-svn: 366505
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64892
llvm-svn: 366385
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64885
llvm-svn: 366376
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348
|
| |
|
|
|
|
| |
Avoids creating an extra intermediate mov.
llvm-svn: 366340
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Extend the atomic optimizer to handle AND, OR and XOR.
Reviewers: arsenm, sheredom
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64809
llvm-svn: 366323
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630
Reviewers: rampitec, mareko
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64807
Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc
llvm-svn: 366314
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: GDS cannot alias anything else.
Original patch by: Marek Olšák
Reviewers: arsenm, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64114
Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0
llvm-svn: 366313
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D64839
llvm-svn: 366283
|
| |
|
|
| |
llvm-svn: 366257
|
| |
|
|
| |
llvm-svn: 366256
|
| |
|
|
|
|
|
|
|
|
| |
I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.
The 16-bit versions don't work yet.
llvm-svn: 366254
|
| |
|
|
|
|
|
|
|
|
| |
When it is AReg_1024 this results in unnecessary copying into
AGPRs of a 32 element vectors even though they are not intended
for an mfma instruction.
Differential Revision: https://reviews.llvm.org/D64815
llvm-svn: 366252
|
| |
|
|
| |
llvm-svn: 366249
|
| |
|
|
| |
llvm-svn: 366248
|
| |
|
|
| |
llvm-svn: 366246
|
| |
|
|
|
|
|
| |
For some reason GlobalISelEmitter needs register classes to import
these, although it works for the load patterns.
llvm-svn: 366242
|
| |
|
|
|
|
| |
Convert the easy cases to formats understood for GlobalISel.
llvm-svn: 366240
|
| |
|
|
|
|
|
|
| |
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.
llvm-svn: 366237
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64328
llvm-svn: 366235
|
| |
|
|
|
|
|
| |
Rewrite PatFrags using the new PatFrag address space matching in
tablegen. These will now work with both SelectionDAG and GlobalISel.
llvm-svn: 366234
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64145
llvm-svn: 366223
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Apparently the check for legal instructions during instruction
select does not happen without an asserts build, so these would
successfully select in release, and fail in debug.
Make s16 and/or/xor legal. These can just be selected directly
to the 32-bit operation, as is already done in SelectionDAG, so just
make them legal.
llvm-svn: 366210
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:
$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
-DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
-DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
-config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}
llvm-svn: 366177
|
| |
|
|
|
|
|
|
|
|
| |
Use the MemoryVT field. This will be necessary for tablegen to
automatically handle patterns for GlobalISel.
Doesn't handle the d16 lo/hi patterns. Those are a special case since
it involvess the custom node type.
llvm-svn: 366168
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Enable hoisting and merging m0 defs that are initialized with the same
immediate value. Fixes bug where removed instructions are not considered
to interfere with other inits, and make sure to not hoist inits before block
prologues.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64766
llvm-svn: 366135
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already do this for the flat and DS instructions, although it is
certainly uglier and more verbose.
This will allow using separate pattern definitions for extload and
zextload. Currently we get away with using a single PatFrag with
custom predicate code to check if the extension type is a zextload or
anyextload. The generic mechanism the global isel emitter understands
treats these as mutually exclusive. I was considering making the
pattern emitter accept zextload or sextload extensions for anyextload
patterns, but in global isel, the different extending loads have
distinct opcodes, and there is currently no mechanism for an opcode
matcher to try multiple (and there probably is very little need for
one beyond this case).
llvm-svn: 366132
|