| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
build_vector is a more useful canonical form when
pattern matching packed operations, so turn shift
into high element into a build_vector.
Should show no change for now.
llvm-svn: 312282
|
|
|
|
|
|
|
|
|
|
| |
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.
Differential Revision: https://reviews.llvm.org/D36856
llvm-svn: 312095
|
|
|
|
| |
llvm-svn: 312087
|
|
|
|
|
|
|
|
|
|
| |
See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152
Reviewers: SamWot, artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D36674
llvm-svn: 311006
|
|
|
|
|
|
| |
Handle the sibling call cases.
llvm-svn: 310753
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.
Reviewers: tstellar, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34719
llvm-svn: 310088
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.
Reviewers: arsenm, tpr, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D35167
llvm-svn: 310085
|
|
|
|
| |
llvm-svn: 309781
|
|
|
|
|
|
|
|
|
| |
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.
llvm-svn: 309732
|
|
|
|
|
|
| |
Testing is in the follow up change
llvm-svn: 308779
|
|
|
|
|
|
|
|
|
|
| |
See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591
Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm
Differential Revision: https://reviews.llvm.org/D35424
llvm-svn: 308740
|
|
|
|
| |
llvm-svn: 308639
|
|
|
|
|
|
| |
omod is supported for VOP3 instructions
llvm-svn: 308310
|
|
|
|
|
|
|
| |
Original commit log:
[AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
llvm-svn: 308270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for VOP3 instructions
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D35350
llvm-svn: 308179
|
|
|
|
|
|
|
|
| |
If immediate in shift is less than 32 we can use alignbit too.
Differential Revision: https://reviews.llvm.org/D34729
llvm-svn: 306500
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34655
llvm-svn: 306449
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D23209
llvm-svn: 303111
|
|
|
|
| |
llvm-svn: 303098
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D23209
llvm-svn: 303091
|
|
|
|
|
|
|
|
|
|
| |
v2: More tests, bug fixes, cosmetic changes.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D31762
llvm-svn: 301677
|
|
|
|
|
|
|
| |
There is no need to copy the operands or inspect the sources.
Also remove some unnecessary clamp/omod usage.
llvm-svn: 301363
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0.
This is fine for most targets. However for amdgcn target, the size of pointer in address space 0
depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is
32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target
triple environment. Therefore a hook is need in target lowering for getting the fence operand type.
This patch has no effect on targets other than amdgcn.
Differential Revision: https://reviews.llvm.org/D32186
llvm-svn: 301215
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes traps in any block besides the entry block,
and fixes depending on a live-in physical register
by using a virtual register copy.
Also happens to stop emitting a nop in the case
debug trap is not supported.
llvm-svn: 301206
|
|
|
|
|
|
|
|
|
|
| |
This is possible in ways that are not compiler bugs,
so stop asserting on them.
This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.
llvm-svn: 299712
|
|
|
|
|
|
|
|
|
|
| |
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
|
|
|
|
|
|
|
|
|
|
| |
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.
llvm-svn: 298729
|
|
|
|
| |
llvm-svn: 298454
|
|
|
|
|
|
|
|
| |
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.
llvm-svn: 298452
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the type list.
It's surprising this was working before. fdiv.fast had
the wrong number of parameters. The control flow intrinsic
declaration attributes were not being applied, and
their types were inconsistent. The actual IR use types
did not match the declaration, and were closer to the
types used for the patterns. The brcond lowering
was changing the types, so introduce new nodes for those.
llvm-svn: 298119
|
|
|
|
|
|
|
|
|
| |
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.
llvm-svn: 297873
|
|
|
|
| |
llvm-svn: 296401
|
|
|
|
| |
llvm-svn: 296396
|
|
|
|
|
|
|
|
| |
Add a few non-VOP3P but instructions related to packed.
Includes hack with dummy operands for the benefit of the assembler
llvm-svn: 296368
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D29958
llvm-svn: 296186
|
|
|
|
|
|
|
|
| |
Hit on ASICs that support 16bit instructions.
Differential Revision: https://reviews.llvm.org/D30281
llvm-svn: 295990
|
|
|
|
|
|
|
| |
This is the pattern that falls out of the instruction's
definition if offset == 0.
llvm-svn: 295912
|
|
|
|
| |
llvm-svn: 295908
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D30232
llvm-svn: 295904
|
|
|
|
|
|
| |
This reverts commit r295867.
llvm-svn: 295871
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D30232
llvm-svn: 295867
|
|
|
|
|
|
|
|
|
|
|
| |
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
llvm-svn: 295788
|
|
|
|
| |
llvm-svn: 295359
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D26010
llvm-svn: 294692
|
|
|
|
|
|
|
|
|
|
|
| |
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.
For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.
llvm-svn: 293857
|
|
|
|
| |
llvm-svn: 293654
|
|
|
|
|
|
|
|
|
|
| |
I think this is safe as long as no inputs are known to ever
be nans.
Also add an intrinsic for fmed3 to be able to handle all safe
math cases.
llvm-svn: 293598
|
|
|
|
|
|
|
|
|
| |
This is worse if the original constant is an inline immediate.
This should also be done for 64-bit adds, but requires fixing
operand folding bugs first.
llvm-svn: 293540
|
|
|
|
| |
llvm-svn: 292893
|