| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.
llvm-svn: 276435
|
| |
|
|
| |
llvm-svn: 276434
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D22538
llvm-svn: 276298
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D21646
llvm-svn: 276294
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: tstellarAMD, vpykhtin
Subscribers: arsenm, kzhuravl
Differential Revision: https://reviews.llvm.org/D22620
llvm-svn: 276274
|
| |
|
|
| |
llvm-svn: 276257
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D22526
llvm-svn: 276119
|
| |
|
|
|
|
|
|
|
|
|
| |
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.
Simplify the lowering tests by using per function
subtarget features.
llvm-svn: 276051
|
| |
|
|
| |
llvm-svn: 276030
|
| |
|
|
|
|
| |
LGTM'd by Matt Arsenault.
llvm-svn: 276029
|
| |
|
|
|
|
|
|
|
|
|
| |
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.
These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.
llvm-svn: 275988
|
| |
|
|
|
|
|
|
|
| |
Without this fix, releaseSuccessors when InOrOutBlock is
false could release SUs outside the schedule BasicBlock.
Patch by Axel Davy
llvm-svn: 275935
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.
llvm-svn: 275934
|
| |
|
|
|
|
| |
This is already casted above so non-null
llvm-svn: 275881
|
| |
|
|
| |
llvm-svn: 275873
|
| |
|
|
| |
llvm-svn: 275871
|
| |
|
|
| |
llvm-svn: 275870
|
| |
|
|
|
|
|
|
|
|
| |
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.
Also fix crashes on 0 x and 1 x arrays.
llvm-svn: 275869
|
| |
|
|
|
|
|
| |
Non intrinsic calls aren't really handled, and this
IntrinsicInst dyn_cast checks for the function for us.
llvm-svn: 275868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.
Reviewers: tstellarAMD, arsenm
Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D20728
llvm-svn: 275779
|
| |
|
|
|
|
| |
Attempting to fix lit test failure on ppc.
llvm-svn: 275676
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this situation:
%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2
The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.
llvm-svn: 275635
|
| |
|
|
| |
llvm-svn: 275620
|
| |
|
|
| |
llvm-svn: 275619
|
| |
|
|
| |
llvm-svn: 275618
|
| |
|
|
|
|
|
|
| |
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.
Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.
llvm-svn: 275617
|
| |
|
|
|
|
| |
Dead or the same as the base implementation.
llvm-svn: 275616
|
| |
|
|
|
|
| |
This reverts commit r275566.
llvm-svn: 275599
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
|
| |
|
|
|
|
|
|
|
|
| |
Added emitting metadata to elf for runtime.
Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.
Differential Revision: https://reviews.llvm.org/D21849
llvm-svn: 275566
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.
Reviewers: tstellarAMD, mcrosier
Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai
Differential Revision: https://reviews.llvm.org/D22409
llvm-svn: 275564
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.
Fixes bug 28550
llvm-svn: 275510
|
| |
|
|
|
|
| |
Found while reducing bug 28550
llvm-svn: 275509
|
| |
|
|
| |
llvm-svn: 275508
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change fix bug 28538
Reviewers: tstellarAMD, vpykhtin
Subscribers: arsenm, kzhuravl
Differential Revision: https://reviews.llvm.org/D22355
llvm-svn: 275422
|
| |
|
|
|
|
| |
Use the replacement pass to update the tests, and delete old names.
llvm-svn: 275375
|
| |
|
|
|
|
| |
Mesa removed this path, so nothing is using these anymore.
llvm-svn: 275372
|
| |
|
|
| |
llvm-svn: 275371
|
| |
|
|
| |
llvm-svn: 275369
|
| |
|
|
| |
llvm-svn: 275309
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
v2: don't count SGPRs spilled to scratch twice
I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
[scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].
The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D22197
llvm-svn: 275288
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21484
llvm-svn: 275268
|
| |
|
|
| |
llvm-svn: 275253
|
| |
|
|
|
|
|
|
| |
- Add new TTI instruction checks
- Don't use const for blocks that are mutated.
- Checking isBranch and isTerminator should be redundant
llvm-svn: 275252
|
| |
|
|
|
|
| |
I meant to squash this into it.
llvm-svn: 275220
|
| |
|
|
|
|
|
| |
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.
llvm-svn: 275203
|
| |
|
|
|
|
|
| |
No test since these aren't used now, except for one place
in a pre-emit pass.
llvm-svn: 275200
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D22239
llvm-svn: 275197
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22217
llvm-svn: 275160
|
| |
|
|
| |
llvm-svn: 275133
|