| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26939
llvm-svn: 287608
|
| |
|
|
|
|
|
|
|
|
| |
A target intrinsic may be defined as possibly reading memory,
but the call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic
assumption of the intrinsic definition, so the chain should
still be used.
llvm-svn: 287593
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26862
llvm-svn: 287389
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The 32-bit instructions don't zero the high 16-bits like the 16-bit
instructions do.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D26828
llvm-svn: 287342
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects
e.g. shaders that access buffer textures.
Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention. If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D26747
llvm-svn: 287339
|
| |
|
|
|
|
|
| |
There are still crashes on non-MVT types in other
places.
llvm-svn: 287310
|
| |
|
|
|
|
|
|
| |
This reverts commit r287146.
This breaks few conformance tests.
llvm-svn: 287233
|
| |
|
|
| |
llvm-svn: 287204
|
| |
|
|
| |
llvm-svn: 287201
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26732
llvm-svn: 287199
|
| |
|
|
|
|
|
| |
This fixes a probably unintended divergence from the default
scheduler behavior.
llvm-svn: 287146
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
instructions.
The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.
This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23408
llvm-svn: 287131
|
| |
|
|
|
|
|
|
|
|
| |
- Select `select` to `v_cndmask_b32`
- Expand `select_cc`
- Refactor patterns
Differential Revision: https://reviews.llvm.org/D26714
llvm-svn: 287074
|
| |
|
|
|
|
|
|
|
|
| |
wbinvl.* are vector instruction that do not sue vector registers.
v2: check only M?BUF instructions
Differential Revision: https://reviews.llvm.org/D26633
llvm-svn: 287056
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D26670
llvm-svn: 287035
|
| |
|
|
|
|
|
| |
Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.
llvm-svn: 287021
|
| |
|
|
|
|
|
| |
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.
llvm-svn: 287018
|
| |
|
|
|
|
|
|
|
|
|
| |
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.
Differential Revision: https://reviews.llvm.org/D26585
llvm-svn: 287007
|
| |
|
|
| |
llvm-svn: 286931
|
| |
|
|
| |
llvm-svn: 286912
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend image intrinsics to support data types of V1F32 and V2F32.
TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
even though such case should be very rare.
Reviewers:
tstellarAMD
Differential Revision:
http://reviews.llvm.org/D26472
llvm-svn: 286860
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.
This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.
The cache also needs to be flushed before wave termination
if a scalar store is used.
llvm-svn: 286766
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D25975
llvm-svn: 286753
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes a regression caused by r286464.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D26570
llvm-svn: 286687
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This pass was assuming that when a PHI instruction defined a register
used by another PHI instruction that the defining insstruction would
be legalized before the using instruction.
This assumption was causing the pass to not legalize some PHI nodes
within divergent flow-control.
This fixes a bug that was uncovered by r285762.
Reviewers: nhaehnle, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D26303
llvm-svn: 286676
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.
Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.
Differential Revision: https://reviews.llvm.org/D25140
llvm-svn: 286544
|
| |
|
|
|
|
|
|
| |
condition copies"
This reverts commit r286171, it breaks piglit test fs-discard-exit-2
llvm-svn: 286530
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.
However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).
This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.
Differential Revision: https://reviews.llvm.org/D25781
llvm-svn: 286502
|
| |
|
|
|
|
|
|
| |
Patch By: Wei Ding
Differential Revision: https://reviews.llvm.org/D18049
llvm-svn: 286464
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.
With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.
llvm-svn: 286171
|
| |
|
|
|
|
|
| |
The comment explaining why this was necessary is incorrect
in its description of v_cmp's behavior for inactive workitems.
llvm-svn: 286134
|
| |
|
|
|
|
| |
This reverts commit r285939 and r285948. These broke some conformance tests.
llvm-svn: 285995
|
| |
|
|
|
|
|
|
| |
Patch By: Wei Ding
Differential Revision: https://reviews.llvm.org/D18049
llvm-svn: 285939
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
hange explores the fact that LDS reads may be reordered even if access
the same location.
Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.
Differential Revision: https://reviews.llvm.org/D25944
llvm-svn: 285919
|
| |
|
|
|
|
| |
Some of these are already fixed or tested somewhere else.
llvm-svn: 285840
|
| |
|
|
| |
llvm-svn: 285828
|
| |
|
|
|
|
| |
This is already done with VGPR immediates and saves 4 bytes.
llvm-svn: 285765
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.
Also start verifying the register classes of inlineasm operands.
llvm-svn: 285762
|
| |
|
|
|
|
|
|
|
|
|
| |
This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll
Differential Revision: https://reviews.llvm.org/D25802
llvm-svn: 285716
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.
The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot. I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.
Reviewers: bogner, ab, t.p.northover, arsenm
Subscribers: wdng, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D25999
llvm-svn: 285704
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D26077
llvm-svn: 285684
|
| |
|
|
|
|
| |
I'm guessing at how it is supposed to be printed
llvm-svn: 285490
|
| |
|
|
|
|
|
|
|
|
| |
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.
This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.
llvm-svn: 285463
|
| |
|
|
| |
llvm-svn: 285449
|
| |
|
|
|
|
| |
This is possible when using inline asm.
llvm-svn: 285447
|
| |
|
|
|
|
|
|
|
|
|
| |
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.
llvm-svn: 285435
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25829
llvm-svn: 285273
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.
Refactor processor definition to use ISA version features.
Fixed ISA version for stoney.
Based on Laurent Morichetti's patch.
Differential Revision: https://reviews.llvm.org/D25919
llvm-svn: 285210
|
| |
|
|
|
|
| |
This reverts r283003
llvm-svn: 285203
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25788
llvm-svn: 285198
|