| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.
Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.
This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.
llvm-svn: 341801
|
|
|
|
|
|
|
|
|
|
|
| |
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.
Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.
llvm-svn: 339512
|
|
|
|
|
|
|
| |
This usually avoids some re-packing code, and may
help find canonical sources.
llvm-svn: 339072
|
|
|
|
| |
llvm-svn: 338715
|
|
|
|
| |
llvm-svn: 338714
|
|
|
|
| |
llvm-svn: 338382
|
|
|
|
|
|
|
|
| |
When fcanonicalize is lowered to a mul, we can
use -1.0 for free and avoid the cost of the bigger
encoding for source modifers.
llvm-svn: 338244
|
|
|
|
|
|
|
|
|
|
|
|
| |
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.
Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.
llvm-svn: 332953
|
|
|
|
|
|
|
|
| |
Literal encoding needs op_sel_hi to select low 16 bit in this case.
Differential Revision: https://reviews.llvm.org/D45745
llvm-svn: 330230
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add gfx704
- Change bonaire to gfx704
- Remove gfx804
- Remove gfx901
- Remove gfx903
Differential Revision: https://reviews.llvm.org/D40046
llvm-svn: 320194
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reviewers: arsenm, vpykhtin, rampitec
Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D37817
llvm-svn: 319662
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37325
llvm-svn: 312676
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37522
llvm-svn: 312660
|
|
|
|
|
|
|
|
|
|
| |
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.
Differential Revision: https://reviews.llvm.org/D36856
llvm-svn: 312095
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D34407
llvm-svn: 307097
|
|
|
|
|
|
|
|
|
| |
It broke a testcase.
Failing Tests (1):
LLVM :: CodeGen/AMDGPU/alignbit-pat.ll
llvm-svn: 307054
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D34407
llvm-svn: 307026
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove dependency of SDWA pass on SIShrinkInstructions.
The goal is to move SDWA even higher in the stack to avoid second run
of MachineLICM, MachineCSE and SIFoldOperands.
Also added handling to preserve original src modifiers.
Differential Revision: https://reviews.llvm.org/D33860
llvm-svn: 304665
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.
Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.
Differential Revision: https://reviews.llvm.org/D33583
llvm-svn: 304219
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D32361
llvm-svn: 301028
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: vpykhtin, rampitec, arsenm
Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D31671
llvm-svn: 299654
|
|
|
|
|
|
|
|
|
|
|
| |
Reason: breaks multiple bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173
Original Review URL: https://reviews.llvm.org/D31671
llvm-svn: 299583
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: vpykhtin, rampitec, arsenm
Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D31671
llvm-svn: 299536
|
|
|
|
|
|
|
|
|
|
| |
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
|
|
|
|
| |
llvm-svn: 296396
|
|
|
|
| |
llvm-svn: 293654
|
|
|
|
|
|
|
|
|
|
|
| |
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
|
|
|
|
|
|
|
| |
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.
llvm-svn: 292837
|
|
|
|
| |
llvm-svn: 292814
|
|
llvm-svn: 290300
|