| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
I need to use it in the GCN codegen.
Differential Revision: https://reviews.llvm.org/D52123
llvm-svn: 342400
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.
Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+
Also updated the tests to check correct behaviour
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51933
Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
llvm-svn: 342222
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
|
|
|
|
|
|
|
|
|
|
|
| |
TargetParser."
This reverts commit r341982.
The change introduced a layering violation. Reverting to unbreak
our integrate.
llvm-svn: 342023
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
|
|
|
|
| |
llvm-svn: 341165
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9.
Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50575
llvm-svn: 340831
|
|
|
|
| |
llvm-svn: 339814
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is necessary to add a VI specific builtin,
__builtin_amdgcn_s_dcache_wb. We already have an
overly specific feature for one of these builtins,
for s_memrealtime. I'm not sure whether it's better
to add more of those, or to get rid of that and merge
it with vi-insts.
Alternatively, maybe this logically goes with scalar-stores?
llvm-svn: 339104
|
|
|
|
|
|
| |
Reverts r337079 with fix for msan error.
llvm-svn: 337535
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r337021.
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
#1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
#2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
#3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
#4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
#5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
#6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
#7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
#8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26
[...]
Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
#0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192
llvm-svn: 337079
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.
The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.
Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)
llvm-svn: 337021
|
|
|
|
|
|
|
|
| |
SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo
must be initialized first. This fixes msan buildbot failures and was
introduced by r336851.
llvm-svn: 336861
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fixes PR38071.
Reviewers: arsenm, dstenb
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48979
llvm-svn: 336448
|
|
|
|
|
|
|
|
|
|
| |
This was introducing unnecessary padding after the explicit
arguments, depending on the alignment of the total struct type.
Also has the side effect of avoiding creating an extra GEP for
the offset from the base kernel argument to the explicit kernel
argument offset.
llvm-svn: 335999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc. This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself. This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.
Reviewers: arsenm, nhaehnle, jvesely
Reviewed By: arsenm
Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46365
llvm-svn: 335942
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48234
llvm-svn: 335288
|
|
|
|
|
|
|
|
| |
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.
Differential Revision: https://reviews.llvm.org/D46616
llvm-svn: 333926
|
|
|
|
|
|
|
| |
These functions just query the underlying IR function,
so pass it directly.
llvm-svn: 333442
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.
This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.
I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46272
llvm-svn: 332930
|
|
|
|
|
|
|
|
|
| |
- Predicate D16 patterns on this new feature
- Added this new feature to gfx900/2/4
Differential Revision: https://reviews.llvm.org/D46366
llvm-svn: 331551
|
|
|
|
| |
llvm-svn: 331298
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
|
|
|
|
|
|
|
|
| |
Changes by
Matt Arsenault
Konstantin Zhuravlyov
llvm-svn: 331215
|
|
|
|
|
|
|
|
| |
counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account.
Differential Revision: https://reviews.llvm.org/D46067
llvm-svn: 330954
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
v2: - fix regressions in merge-stores.ll and multiple_tails.ll
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r329591.
It breaks various bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516
http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374
http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251
...
llvm-svn: 329610
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329591
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed a bug which caused Tablegen crash.
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837
Differential Revision: https://reviews.llvm.org/D45085
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328983
|
|
|
|
| |
llvm-svn: 328978
|
|
|
|
|
|
|
|
|
| |
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837
Differential Revision: https://reviews.llvm.org/D45085
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328975
|
|
|
|
|
|
|
| |
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.
llvm-svn: 328821
|
|
|
|
|
|
|
|
|
|
|
| |
attribute for AMDGPU
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.
Differential Revision: https://reviews.llvm.org/D43736
llvm-svn: 328349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
This patch supports ds_read_b128 instruction pattern and generation of this instruction.
In the vectorizer, this patch also widen the vector length so that vectorizer generates
128 bit loads for local address-space which gets translated to ds_read_b128.
Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.
Author: FarhanaAleen
Reviewed By: rampitec, arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44210
llvm-svn: 327153
|
|
|
|
|
|
| |
These are the parameters x86 already uses.
llvm-svn: 327020
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.
Reviewers: kzhuravl
Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl
Differential Revision: https://reviews.llvm.org/D42203
llvm-svn: 326088
|
|
|
|
|
|
|
|
|
|
| |
- Remove gfx800
- Make iceland gfx802
- Add xnack to gfx902
Differential Revision: https://reviews.llvm.org/D43355
llvm-svn: 325393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.
SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42756
llvm-svn: 324486
|
|
|
|
|
|
|
|
|
|
|
| |
See bugs 36094, 36095:
https://bugs.llvm.org/show_bug.cgi?id=36094
https://bugs.llvm.org/show_bug.cgi?id=36095
Differential Revision: https://reviews.llvm.org/D42692
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 324231
|
|
|
|
|
|
|
|
|
|
| |
Differential Revision:
https://reviews.llvm.org/D38906
Reviewers:
Matt and Brian.
llvm-svn: 322402
|
|
|
|
|
|
| |
The Function can never be nullptr so we can return a reference.
llvm-svn: 320884
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add gfx704
- Change bonaire to gfx704
- Remove gfx804
- Remove gfx901
- Remove gfx903
Differential Revision: https://reviews.llvm.org/D40046
llvm-svn: 320194
|
|
|
|
|
|
|
|
|
| |
Only used by pre-GCN targets
v2: fix predicate setting for FMA_Common
Differential Revision: https://reviews.llvm.org/D40692
llvm-svn: 319712
|
|
|
|
|
|
|
|
|
|
|
| |
It's not implemented.
Passing +fp64-fp16-denormal feature enables fp64 even on asics that don't support it
v2: fix hasFP64 query
Differential Revision: https://reviews.llvm.org/D39931
llvm-svn: 319709
|
|
|
|
|
|
|
|
| |
GFX9 does not enable bounds checking for the resource descriptors
used for private access, so it should be OK to use vaddr with
a potentially negative value.
llvm-svn: 319393
|
|
|
|
|
|
|
|
|
| |
GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to manually maintain m0
uses as needed.
llvm-svn: 319270
|
|
|
|
|
|
|
|
|
|
| |
This is mostly moving VMEM clause breaking into
the hazard recognizer. Also move another hazard
currently handled in the waitcnt pass.
Also stops breaking clauses unless xnack is enabled.
llvm-svn: 318557
|
|
|
|
|
|
|
| |
Effectively revert r263964. Before we would not
allow this if vaddr was not known to be positive.
llvm-svn: 318240
|
|
|
|
| |
llvm-svn: 318100
|