| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
ds_ordered_count can now simultaneously operate on up to 4 dwords
in a single instruction, which are taken from (and returned to)
lanes 0..3 of a single VGPR.
Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276
Reviewers: mareko, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63716
llvm-svn: 364815
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Original patch by Marek Olšák
Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab
Reviewers: mareko, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63452
llvm-svn: 364814
|
|
|
|
| |
llvm-svn: 364811
|
|
|
|
| |
llvm-svn: 364808
|
|
|
|
|
|
|
|
| |
Also works around tablegen defect in selecting add with unused carry,
but if we have to manually select GEP, might as well handle add
manually.
llvm-svn: 364806
|
|
|
|
| |
llvm-svn: 364805
|
|
|
|
|
|
|
|
|
|
|
| |
There are several things broken, but at least emit the right thing for
gfx9.
The import of the pattern with the unused carry out seems to not
work. Needs a special class for clamp, because OperandWithDefaultOps
doesn't really work.
llvm-svn: 364804
|
|
|
|
| |
llvm-svn: 364801
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58804
llvm-svn: 364797
|
|
|
|
| |
llvm-svn: 364795
|
|
|
|
| |
llvm-svn: 364789
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The stride should depend on the wave size, not the hardware generation.
Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be
relevant.
Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6
Reviewers: arsenm, rampitec, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63808
llvm-svn: 364788
|
|
|
|
|
|
|
| |
This is easy to handle and avoids legalization artifacts which are
likely to obscure combines.
llvm-svn: 364787
|
|
|
|
| |
llvm-svn: 364786
|
|
|
|
|
|
|
| |
isVCC has the same bug, but isn't used in a context where it can cause
a problem.
llvm-svn: 364784
|
|
|
|
| |
llvm-svn: 364783
|
|
|
|
| |
llvm-svn: 364782
|
|
|
|
| |
llvm-svn: 364768
|
|
|
|
| |
llvm-svn: 364767
|
|
|
|
| |
llvm-svn: 364766
|
|
|
|
|
|
| |
Select s64 eq/ne scalar icmp.
llvm-svn: 364765
|
|
|
|
| |
llvm-svn: 364763
|
|
|
|
| |
llvm-svn: 364762
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was checking the size of the register with the value of the size,
which happens to be exec. Also fix assuming VCC is 64-bit to fix
wave32.
Also remove some untested handling for physical registers which is
skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical
copy source. I'm not sure if this should be trying to handle this
special case instead of dealing with this in copyPhysReg.
llvm-svn: 364761
|
|
|
|
|
|
|
| |
Zext from s1 is the only case where this should do anything with the
current legal extensions.
llvm-svn: 364760
|
|
|
|
| |
llvm-svn: 364759
|
|
|
|
| |
llvm-svn: 364758
|
|
|
|
| |
llvm-svn: 364703
|
|
|
|
| |
llvm-svn: 364701
|
|
|
|
| |
llvm-svn: 364699
|
|
|
|
| |
llvm-svn: 364698
|
|
|
|
| |
llvm-svn: 364696
|
|
|
|
| |
llvm-svn: 364695
|
|
|
|
| |
llvm-svn: 364694
|
|
|
|
| |
llvm-svn: 364691
|
|
|
|
|
|
|
|
|
|
| |
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D62735
llvm-svn: 364645
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D63851
llvm-svn: 364619
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The new test case led to incorrect code.
Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca
Reviewers: arsenm, david-salinas
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63871
llvm-svn: 364566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
llvm-svn: 364510
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code.
It tries to print an entry label at the start of every function, but
that didn't work for the first function in the module because
DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart
which is too late.
Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee
Reviewers: arsenm, tpr, kzhuravl
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63712
llvm-svn: 364508
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LivePhysRegs calculated in order to find a scratch register in the
epilogue code wrongly uses 'LiveIns'. Instead, it should use the
'Liveout' sets. For the liveness, also considering the operands of
the terminator (return) instruction which is the insertion point for
the scratch-exec-copy instruction.
Patch by Christudasan Devadasan
llvm-svn: 364470
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Original patch https://reviews.llvm.org/D63659 from
Steven Perron <stevenperron@google.com>
The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in
the successors of blocks that is splits. This is fixed by calling
BasicBlock::splitBasicBlock to split the block instead of doing it
manually. This does extra work because a new conditional branch is
created in BB which is immediately replaced, but I think the simplicity
is worth it. It also helps make the code more future proof in case other
things need to be updated.
llvm-svn: 364342
|
|
|
|
| |
llvm-svn: 364316
|
|
|
|
|
|
|
| |
Somehow ended up with copies of the same tests in AMDGPU and
AMDGPU/GlobalISel
llvm-svn: 364309
|
|
|
|
| |
llvm-svn: 364308
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.
Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.
Some notes:
- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
to a constant at compile times, which means some tests can no longer
be applied.
The current "solution" is a terrible hack, but the intrinsic isn't
used by Mesa, so we can keep it for now.
- We no longer know the full LDS size per kernel at compile time, which
means that we can no longer generate a relevant error message at
compile time. It would be possible to add a check for the size of
individual variables, but ultimately the linker will have to perform
the final check.
Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61494
llvm-svn: 364297
|
|
|
|
| |
llvm-svn: 364262
|
|
|
|
| |
llvm-svn: 364244
|
|
|
|
| |
llvm-svn: 364215
|
|
|
|
| |
llvm-svn: 364214
|