| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D63202
llvm-svn: 363185
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
|
|
|
|
|
|
| |
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd)
This was causing linker warnings on Darwin:
ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.
llvm-svn: 363028
|
|
|
|
|
|
|
| |
This now produces garbage on AMDGPU with a call to an nonexistent,
anonymous libcall but won't assert.
llvm-svn: 363022
|
|
|
|
|
|
| |
Also fix AtomicExpand asserting on atomicrmw fadd/fsub.
llvm-svn: 363021
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
llvm-svn: 362990
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Replace image_load_mip/image_store_mip
with image_load/image_store if lod is 0.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63073
llvm-svn: 362957
|
|
|
|
| |
llvm-svn: 362852
|
|
|
|
|
|
|
|
| |
caller function (compile time performance)
Differential revision: https://reviews.llvm.org/D62917
llvm-svn: 362789
|
|
|
|
| |
llvm-svn: 362761
|
|
|
|
|
|
|
| |
This already forced a skip for VMEM, so it should also be done for
flat. I'm somewhat skeptical about the benefit of this though.
llvm-svn: 362760
|
|
|
|
|
|
|
|
|
|
| |
SIInsertSkips really doesn't understand the control flow, and makes
very stupid assumptions about the block layout. This was able to get
away with not skipping return blocks, since usually after
structurization there is only one placed at the end of the
function. Tail duplication can break this assumption.
llvm-svn: 362754
|
|
|
|
|
|
|
|
|
|
|
|
| |
"Divergence driven ISel. Assign register class for cross block values
according to the divergence."
that discovered the design flaw leading to several issues that
required to be solved before.
This change reverts AMDGPU specific changes and keeps common part
unaffected.
llvm-svn: 362749
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This forced the caller to be aware of this, which is an ugly ABI
feature.
Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.
Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.
Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.
llvm-svn: 362665
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.
Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.
The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.
Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.
Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.
llvm-svn: 362661
|
|
|
|
|
|
|
| |
This has been deprecated for a long time, and mesa recently switched
to amdgpu-flat-work-group-size.
llvm-svn: 362641
|
|
|
|
|
|
|
| |
These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.
llvm-svn: 362640
|
|
|
|
|
|
|
|
|
|
|
| |
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
|
|
|
|
|
|
|
|
|
|
|
|
| |
and countTrailingOnes() return unsigned
This matches APInt's versions of these functions, and there is no need for these to be size_t.
(as well as __builtin_clzll())
Differential Revision: https://reviews.llvm.org/D60823
llvm-svn: 362503
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.
TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.
Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.
llvm-svn: 362447
|
|
|
|
|
|
|
|
|
|
| |
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
|
|
|
|
|
|
|
|
|
|
| |
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292
Reviewers: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D62660
llvm-svn: 362400
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da
Reviewers: rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62720
Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e
llvm-svn: 362390
|
|
|
|
|
|
| |
Fixes missing test from r293000.
llvm-svn: 362275
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.
Differential Revision: https://reviews.llvm.org/D62707
llvm-svn: 362239
|
|
|
|
|
|
| |
Avoids crashing in PEI in a future change.
llvm-svn: 362136
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.
That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.
Differential Revision: https://reviews.llvm.org/D62572
Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
|
|
|
|
|
|
|
|
| |
The patch computes the return address for the current function.
Differential revision: https://reviews.llvm.org/D59666
llvm-svn: 362001
|
|
|
|
|
|
|
|
| |
It introduces performance regressions in several applications.
This has already been submitted downstream.
llvm-svn: 361879
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- There's a regression due to the cross-block RC assignment. Use the
proper way to derive the output register RC in inline asm.
Reviewers: rampitec, alex-t
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62537
llvm-svn: 361868
|
|
|
|
|
|
|
|
| |
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.
llvm-svn: 361848
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
Otherwise, that promotes that scalar register into vector one, which
breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
that (lane mask) scalar register is legalized firstly before its
definition, e.g., due to the mismatch block placement and its
topological order or loop. In that cases, the legalization of PHI
introduces the use of that scalar register as `vreg_1`.
Reviewers: rampitec, nhaehnle, arsenm, alex-t
Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62492
llvm-svn: 361847
|
|
|
|
| |
llvm-svn: 361776
|
|
|
|
|
|
|
|
| |
commit:
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5 Divergence driven ISel. Assign register class for cross block values according to the divergence.
llvm-svn: 361770
|
|
|
|
|
|
|
|
|
|
| |
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D61017
llvm-svn: 361763
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
|
|
|
|
|
|
|
|
| |
They caused the sanitizer builds to fail.
My suspicion is the change the countLeadingZeros().
llvm-svn: 361736
|
|
|
|
|
|
|
|
|
| |
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
|
|
|
|
|
|
|
|
|
|
| |
cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio
llvm-svn: 361688
|
|
|
|
|
|
|
| |
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.
llvm-svn: 361655
|
|
|
|
|
|
|
| |
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.
llvm-svn: 361649
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were assuming a much larger possible per-wave visible stack
allocation than is possible:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70
Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made
this configurable.
llvm-svn: 361541
|
|
|
|
| |
llvm-svn: 361519
|
|
|
|
|
|
|
|
| |
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio
Failing Tests (1):
LLVM :: CodeGen/AMDGPU/regbank-reassign.mir
llvm-svn: 361430
|
|
|
|
|
|
| |
Don't check for unsupported targets for every instruction.
llvm-svn: 361406
|
|
|
|
|
|
|
|
|
|
|
|
| |
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.
For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.
llvm-svn: 361405
|
|
|
|
|
|
|
|
|
|
| |
See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D61012
llvm-svn: 361386
|
|
|
|
| |
llvm-svn: 361333
|