| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
llvm-svn: 307346
|
|
|
|
|
|
| |
Try to increase opportunities to shrink vcc uses.
llvm-svn: 307313
|
|
|
|
| |
llvm-svn: 307312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.
An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:
%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}
Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.
This change allows an "fdiv fast" to be always lowered as rcp + mul.
Differential Revision: https://reviews.llvm.org/D34844
llvm-svn: 307308
|
|
|
|
|
|
|
|
| |
isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.
llvm-svn: 307292
|
|
|
|
|
|
| |
NFC
llvm-svn: 307186
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D34407
llvm-svn: 307097
|
|
|
|
|
|
| |
Patch by cwabbott (Connor Abbott).
llvm-svn: 307081
|
|
|
|
|
|
|
|
|
| |
It broke a testcase.
Failing Tests (1):
LLVM :: CodeGen/AMDGPU/alignbit-pat.ll
llvm-svn: 307054
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D34407
llvm-svn: 307026
|
|
|
|
| |
llvm-svn: 306995
|
|
|
|
|
|
| |
suport -> support
llvm-svn: 306968
|
|
|
|
|
|
|
| |
This was an old workaround for using v16i8 in some old intrinsics
for resource descriptors.
llvm-svn: 306603
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D34531
llvm-svn: 306554
|
|
|
|
|
|
|
|
| |
If immediate in shift is less than 32 we can use alignbit too.
Differential Revision: https://reviews.llvm.org/D34729
llvm-svn: 306500
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34655
llvm-svn: 306449
|
|
|
|
|
|
|
|
|
|
|
| |
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.
Differential Revision: https://reviews.llvm.org/D34545
llvm-svn: 306446
|
|
|
|
|
|
|
|
|
|
| |
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.
Differential Revision: https://reviews.llvm.org/D34500
llvm-svn: 306439
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.
Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl
Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye
Differential Revision: https://reviews.llvm.org/D34626
llvm-svn: 306413
|
|
|
|
|
|
| |
succesor -> successor
llvm-svn: 306393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.
This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).
Reviewers: arsenm
Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D33319
llvm-svn: 306375
|
|
|
|
| |
llvm-svn: 306312
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D34589
llvm-svn: 306298
|
|
|
|
| |
llvm-svn: 306265
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.
Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.
llvm-svn: 306264
|
|
|
|
|
|
|
|
|
|
|
| |
The intention of processFixupValue is not to redefine the semantics of
MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or
not depending on what it looks like. At least it is a local hack now.
I left a fix for anyone trying to figure out what producers should be
producing a different expression.
llvm-svn: 306200
|
|
|
|
| |
llvm-svn: 306189
|
|
|
|
| |
llvm-svn: 306178
|
|
|
|
|
|
|
|
|
|
|
|
| |
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
llvm-svn: 306177
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D34349
llvm-svn: 306112
|
|
|
|
|
|
|
| |
Variable was unused in non-debug build (used in assert) causing compile time
warning and eventual build failure
llvm-svn: 306034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intrinsic already existed for llvm.SI.tbuffer.store
Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*
Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr
Differential Revision: https://reviews.llvm.org/D30687
llvm-svn: 306031
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
encoding
Summary:
Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA.
There are no real instructions for them, but there are pseudo instructions.
Reviewers: arsenm, vpykhtin, cfang
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34403
llvm-svn: 305999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026
Reviewers: vpykhtin, rampitec, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34241
llvm-svn: 305986
|
|
|
|
|
|
|
|
| |
This is one of the nodes which also compile as v_cmp_*.
Differential Revision: https://reviews.llvm.org/D34485
llvm-svn: 305970
|
|
|
|
| |
llvm-svn: 305968
|
|
|
|
|
|
|
|
|
| |
If one of the arguments of adde/sube is zero we can fold another
add/sub into it.
Differential Revision: https://reviews.llvm.org/D34374
llvm-svn: 305964
|
|
|
|
|
|
|
|
|
| |
This simplification allows to avoid generating v_cndmask_b32
to serialize condition code between compare and use.
Differential Revision: https://reviews.llvm.org/D34300
llvm-svn: 305962
|
|
|
|
|
|
|
|
|
|
| |
See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509
Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin
Differential Revision: https://reviews.llvm.org/D34360
llvm-svn: 305923
|
|
|
|
|
|
|
|
|
|
|
|
| |
different than any of the src
See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279
Reviewers: artem.tamazov, vpykhtin
Differential Revision: https://reviews.llvm.org/D34003
llvm-svn: 305915
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9.
Reviewers: dp, arsenm, vpykhtin
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov
Differential Revision: https://reviews.llvm.org/D34026
llvm-svn: 305886
|
|
|
|
| |
llvm-svn: 305844
|
|
|
|
|
|
|
|
|
| |
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.
Differential Revison: https://reviews.llvm.org/D34291
llvm-svn: 305840
|
|
|
|
| |
llvm-svn: 305838
|
|
|
|
|
|
|
|
|
| |
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.
llvm-svn: 305821
|
|
|
|
|
|
|
|
| |
If the source was a copy of an undef register, this would
produce a read of an undefined register which is a verifier
error.
llvm-svn: 305816
|
|
|
|
|
|
|
|
| |
SGPRs are generally cheaper, so try to use them over VGPRs.
Differential Revision: https://reviews.llvm.org/D34130
llvm-svn: 305815
|
|
|
|
| |
llvm-svn: 305814
|
|
|
|
|
|
|
|
| |
The offset may not be an inline immediate, so this needs
to be materialized into a register. The post-RA run of
SIShrinkInstructions is able to fold it later if it can.
llvm-svn: 305761
|
|
|
|
|
|
|
|
|
| |
It adds it for the target after inlining but before SROA where
we can get most out of it.
Differential Revision: https://reviews.llvm.org/D34366
llvm-svn: 305759
|