| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.
This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
|
| |
|
|
|
|
|
|
| |
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
|
| |
|
|
|
|
|
|
|
|
| |
Summary: Add lowering support for 32-bit vec3 variant of s.buffer.load intrinsic.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70118
|
| |
|
|
|
|
|
|
| |
The usage of target boolean checks is overly inflexible, since sext
and zext of a compare are equally cheap. The choice is arbitrary, but
using 0/1 to some degree is the choice of lower resistance since
that's what most targets use. This enables a few combines that don't
bother to support ZeroOrNegativeOneBooleanContent.
|
| |
|
|
| |
Avoids another regression in a future patch.
|
| |
|
|
|
|
| |
This is the same as the add case, but inverts the operation type.
This avoids regressions in a future patch.
|
| |
|
|
| |
This only works if there is no use of the return value.
|
| |
|
|
|
| |
For AMDGPU this is dependent on the FP mode, which should eventually
not be a property of the subtarget.
|
| |
|
|
|
|
|
|
|
| |
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
|
| |
|
|
|
| |
These can be directly taken from the GlobalValue instead of going
through the type.
|
| |
|
|
|
|
|
|
| |
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
|
| |
|
|
| |
llvm-svn: 375457
|
| |
|
|
|
|
|
| |
This doesn't use the default value, so doesn't benefit from the hack
to help optimize it.
llvm-svn: 375450
|
| |
|
|
|
|
| |
Calls to constants should probably be generally handled.
llvm-svn: 375356
|
| |
|
|
|
|
| |
Now X86ISelLowering doesn't depend on many IR analyses.
llvm-svn: 375320
|
| |
|
|
|
|
|
|
|
|
|
| |
Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This
will allow the register coalescer to do a better job eliminating
copies to m0.
For GlobalISel, as a terrible hack, use SGPR_32 for things that should
use SCC until booleans are solved.
llvm-svn: 375267
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68993
llvm-svn: 375084
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues were discovered.
Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.
Discovered issues were addressed in the following commits:
https://reviews.llvm.org/D67662
https://reviews.llvm.org/D67101
https://reviews.llvm.org/D63953
https://reviews.llvm.org/D63731
This change brings back AMDGPU specific changes.
Reviewed by: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D68635
llvm-svn: 374767
|
| |
|
|
|
|
|
|
|
| |
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.
llvm-svn: 374284
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores. It also limits
the SILoadStoreOptimizer from merging more instructions.
This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65097
llvm-svn: 374087
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.
Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store
Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.
The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.
There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.
Reviewers: arsenm, nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68200
llvm-svn: 373491
|
| |
|
|
|
|
|
|
|
|
|
| |
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
llvm-svn: 373292
|
| |
|
|
| |
llvm-svn: 373081
|
| |
|
|
|
|
|
|
|
|
|
| |
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.
Differential Revision: https://reviews.llvm.org/D67121
llvm-svn: 372935
|
| |
|
|
|
|
|
|
|
| |
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67620
llvm-svn: 372231
|
| |
|
|
|
|
|
| |
Fixes assertion with workitem ID intrinsics used in non-kernel
functions.
llvm-svn: 371951
|
| |
|
|
|
|
|
|
| |
There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.
llvm-svn: 371533
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Reviewed By: courbet
Subscribers: wuzish, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67386
llvm-svn: 371511
|
| |
|
|
| |
llvm-svn: 371364
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
|
| |
|
|
|
|
|
| |
The library currently uses ptrtoint and directly checks the queue ptr
for this, which counts as a pointer capture.
llvm-svn: 371009
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SI_PC_ADD_REL_OFFSET is 0"
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.
Reviewers: nhaehnle, arsenm, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65813
llvm-svn: 370667
|
| |
|
|
|
|
|
| |
This is something of a workaround since computeRegisterProperties
seems to be doing the wrong thing.
llvm-svn: 370086
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.
Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.
llvm-svn: 369038
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.
Reviewers: arsenm, rampitec
Reviewed By: arsenm, rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65620
llvm-svn: 367969
|
| |
|
|
|
|
|
| |
This reverts commit r367882. It broke the test
MC/Disassembler/AMDGPU/gfx10_dasm_all.txt.
llvm-svn: 367904
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.
Reviewers: arsenm, rampitec
Reviewed By: arsenm, rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65620
llvm-svn: 367882
|
| |
|
|
|
|
|
| |
Don't assume format loads for f16. Also fixes support for targets
without i16.
llvm-svn: 367879
|
| |
|
|
|
|
|
|
|
| |
This was switching to use a format store for a non-format store for
f16 types. Also fixes i16/f16 stores on targets without legal f16.
The corresponding loads also need to be fixed.
llvm-svn: 367872
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jfb, jakehehrlich
Reviewed By: jfb
Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65514
llvm-svn: 367828
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Wrapping increment/decrement. These aren't exposed by many APIs...
Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c
Reviewers: arsenm, tpr, dstuttard
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65283
llvm-svn: 367821
|
| |
|
|
|
|
| |
llvm::Register as started by r367614. NFC
llvm-svn: 367633
|
| |
|
|
|
|
|
| |
Any register should work for the src field since r366067, since the
used value is not pulled from the expected encoding field.
llvm-svn: 367598
|
| |
|
|
|
|
|
| |
Since this now emits a direct copy to m0, SIFixSGPRCopies has to
handle a physical register.
llvm-svn: 367593
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D65471
llvm-svn: 367347
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)
Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65325
Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee
llvm-svn: 367206
|