| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
The old version might be faster on EG (RECIP_IEEE is Trans only),
but it'd need extra corner case checks.
This gives correct corner case behaviour and saves a register.
Fixes OCL CTS sqrt test (1-thread, scalar) on Turks.
Reviewer: arsenm
Differential Revision: https://reviews.llvm.org/D74017
(cherry picked from commit e6686adf8a743564f0c455c34f04752ab08cf642)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.
While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.
This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70781
(cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960)
|
|
|
|
| |
(cherry picked from commit 7dc49f77ee508b4152f9291c8e804e4eda3653d3)
|
|
|
|
|
|
|
|
|
|
| |
R600 relies on this behaviour.
Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"')
Fixes ~100 piglit regressions since 6e18266
Differential Revision: https://reviews.llvm.org/D72991
(cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
|
|
|
|
|
|
|
|
|
| |
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Mesa.
(cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.
This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
|
|
|
|
| |
- There are typos introduced due to merge.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
arranged after phi elimination or DeSSA. It's enhanced to handle the
dead register definition by skipping use check on it. Once a register
def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
`detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.
Reviewers: rampitec, qcolombet, sunfish
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72709
|
|
|
|
|
|
| |
- Prefer `getVectorIdxTy()` as the index operand type for
`EXTRACT_SUBVECTOR` as targets expect different types by overloading
`getVectorIdxTy()`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Mem op clustering adds a weak edge in the DAG between two loads or
stores that should be clustered, but the direction of this edge is
pretty arbitrary (it depends on the sort order of MemOpInfo, which
represents the operands of a load or store). This often means that two
loads or stores will get reordered even if they would naturally have
been scheduled together anyway, which leads to test case churn and goes
against the scheduler's "do no harm" philosophy.
The fix makes sure that the direction of the edge always matches the
original code order of the instructions.
Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72706
|
|
|
|
|
|
|
| |
This change allows to model the height of the instruction
within a bundle for latency adjustment purposes.
Differential Revision: https://reviews.llvm.org/D72669
|
|
|
|
|
|
|
|
| |
We do not have InstrItinerary so generic getInstLatency() was always
defaulting to return 1 cycle. We need to use TargetSchedModel instead
to compute an instruction's latency.
Differential Revision: https://reviews.llvm.org/D72655
|
| |
|
|
|
|
|
| |
A future change will try to fold constant offsets into the loop which
these will stress.
|
|
|
|
|
|
|
|
| |
The branch target needs to be changed depending on whether there is an
unconditional branch or not.
Loops also need to be similarly fixed, but compiling a simple testcase
end to end requires another set of patches that aren't upstream yet.
|
|
|
|
| |
It's possible to have a type that needs a mask greater than 64-bits.
|
|
|
|
|
|
|
|
|
|
| |
LSHR/SHL (PR44526)
As detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.
This patch adds support to SelectionDAG::ComputeKnownBits to use KnownBits::countMinTrailingZeros and countMinLeadingZeros to set the minimum guaranteed leading/trailing known zero bits.
Differential Revision: https://reviews.llvm.org/D72573
|
|
|
|
|
| |
This avoids slightly different scheduling/regalloc behavior, and
avoids a test diff between GlobalISel and SelectionDAG.
|
|
|
|
|
| |
We don't use the xexec register classes for arbitrary values
anymore. Avoids a test variance beween GlobalISel and SelectionDAG>
|
|
|
|
|
| |
getDefIgnoringCopies will fail to find any def if no type is set if we
try to use it on the use's operand, so propagate the type.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions.
While this is necessary for the special handling of moving results out of WWM live ranges,
it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every
WQM operation. This change uses a COPY psuedo in these cases, which allows the register
allocator to coalesce the moves away.
Reviewers: tpr, dstuttard, foad, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71386
|
|
|
|
|
| |
Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so
fewer cases actually work.
|
|
|
|
|
| |
Doesn't try to do the fold into the base register of an add of a
constant in the index like the DAG path does.
|
|
|
|
|
|
| |
If an SGPR vector is indexed with a VGPR, the actual indexing will be
done on the SGPR and produce an SGPR. A copy needs to be inserted
inside the waterwall loop to the VGPR result.
|
|
|
|
|
|
|
| |
Bundles coming to scheduler considered free, i.e. zero latency.
Fixed.
Differential Revision: https://reviews.llvm.org/D72487
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.
Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.
Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.
Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.
Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.
One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
|
|
|
|
|
| |
Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a,
this uses intermediate generic instructions.
|
|
|
|
|
|
|
| |
When these arguments are broken down by the EVT based callbacks, the
pointer information is lost. Hack around this by coercing the register
types to be the expected pointer element type when building the
remerge operations.
|
|
|
|
|
|
| |
This should be legal, but will require future selection work. 16-bit
shift amounts were already removed from being legal, but this didn't
adjust the transformation rules.
|
|
|
|
|
| |
This isn't too useful now, since nothing is currently trying to form
min/max from cmp+select.
|
| |
|
| |
|
|
|
|
|
|
|
| |
There was an unguarded dereference of MF in a function that permitted
nullptr. Fixed
This reverts commit 71d64f72f934631aa2f12b9542c23f74f256f494.
|
|
|
|
|
| |
This reverts commit 3ef05d85be8c3666ebfa3ad986eb334da5195a47.
It broke check-llvm on many bots, see comments on D69836.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.
* Target specific immediate mnemonic need to start with "." follows by
identifier string. When MIR parser sees immediate it will call target
specific parsing function.
* Custom pseudo source value need to start with custom follows by
double-quoted string. MIR parser will pass the quoted string to target
specific PSV parsing function.
* MIRFormatter have 2 helper functions to facilitate LLVM value printing
and parsing for custom PSV if they refers LLVM values.
Patch by Peng Guo
Reviewers: dsanders, arsenm
Reviewed By: dsanders
Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69836
|
|
|
|
|
|
| |
Forgot to credit Peng in the commit message.
This reverts commit be841f89d0014b1e0246a4feae941b2f74abd908.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.
* Target specific immediate mnemonic need to start with "." follows by
identifier string. When MIR parser sees immediate it will call target
specific parsing function.
* Custom pseudo source value need to start with custom follows by
double-quoted string. MIR parser will pass the quoted string to target
specific PSV parsing function.
* MIRFormatter have 2 helper functions to facilitate LLVM value printing
and parsing for custom PSV if they refers LLVM values.
Reviewers: dsanders, arsenm
Reviewed By: dsanders
Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69836
|
|
|
|
|
| |
4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal
condition type set for pointers with any unrecognized address space.
|
| |
|
| |
|
|
|
|
|
| |
This was only applying the deeper nested zext pattern, and missing the
special case code size fold.
|
|
|
|
|
| |
The optimization to turn an add into a sub isn't triggering when the
pattern to use the zeroed high bits is used.
|
|
|
|
|
| |
We weren't treating i16->f16 casts as legal on targets with these
instructions, and always using a pair of casts through i32.
|
|
|
|
|
| |
The imm folding optimization pattern failed to import. The instruction
pattern was already working, but failing to fail on SGPR inputs.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Don't overwrite existing target-cpu attributes.
I've often found the replacement behavior annoying, and this is
inconsistent with how the fast math command line flags interact with
the function attributes.
Does not yet change target-features, since I think that should behave
as a concatenation.
|
|
|
|
|
| |
This wasn't catching a regression on targets with legal i16 triggered
in a future commit.
|
| |
|