| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Partially fixes Bug 28232.
Lit tests added.
Differential Revision: https://reviews.llvm.org/D25367
llvm-svn: 283567
|
| |
|
|
|
|
|
|
|
|
|
|
| |
to AMDGPUBaseInfo.h
Reviewers: artem.tamazov, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D25084
llvm-svn: 283560
|
| |
|
|
| |
llvm-svn: 283558
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D25302
llvm-svn: 283555
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There was a bug with sequences like
s_mov_b64 s[0:1], exec
s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
...
s_mov_b64_term exec, s[2:3]
because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.
Note that the test case also exposes an unrelated bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25306
llvm-svn: 283528
|
| |
|
|
| |
llvm-svn: 283515
|
| |
|
|
| |
llvm-svn: 283476
|
| |
|
|
| |
llvm-svn: 283475
|
| |
|
|
|
|
|
|
| |
When constant folding an operation to a copy or an immediate
mov, the implicit uses/defs of the old instruction were left behind,
e.g. replacing v_or_b32 left the implicit exec use on the new copy.
llvm-svn: 283471
|
| |
|
|
|
|
| |
Fix bad merge
llvm-svn: 283470
|
| |
|
|
| |
llvm-svn: 283469
|
| |
|
|
|
|
| |
Make the necessary refactorings to make use of PseudoInstExpansion
llvm-svn: 283467
|
| |
|
|
|
|
|
| |
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.
llvm-svn: 283464
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.
Initialize MCObjectFileInfo with some default values.
Reviewers: vpykhtin, artem.tamazov, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D24802
llvm-svn: 283450
|
| |
|
|
|
|
|
|
| |
These ones need to have the size on the pseudo instruction set for
getInstSizeInBytes to work correctly. These also have a statically
known size.
llvm-svn: 283437
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D25121
llvm-svn: 283415
|
| |
|
|
|
|
|
|
|
| |
The register scavenging code does not support multiple definitions of
the same vreg.
Differential Revision: https://reviews.llvm.org/D25220
llvm-svn: 283369
|
| |
|
|
|
|
|
| |
Allow inserting multiple instructions in the
expanded loop.
llvm-svn: 283177
|
| |
|
|
| |
llvm-svn: 283175
|
| |
|
|
| |
llvm-svn: 283133
|
| |
|
|
| |
llvm-svn: 283130
|
| |
|
|
| |
llvm-svn: 283108
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D24727
llvm-svn: 283099
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D25110
llvm-svn: 283087
|
| |
|
|
| |
llvm-svn: 283004
|
| |
|
|
|
|
|
| |
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038
llvm-svn: 283003
|
| |
|
|
|
|
| |
This removes many re-initializations of a base register to 0.
llvm-svn: 282999
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D24973
llvm-svn: 282877
|
| |
|
|
|
|
|
|
| |
instruction
Differential Revision: https://reviews.llvm.org/D24985
llvm-svn: 282875
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D25055
llvm-svn: 282873
|
| |
|
|
|
|
|
|
|
|
| |
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.
llvm-svn: 282832
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.
This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.
llvm-svn: 282667
|
| |
|
|
|
|
|
|
| |
instructions
Differential Revision: https://reviews.llvm.org/D24125
llvm-svn: 282624
|
| |
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT
llvm-svn: 282604
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill
behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 282600
|
| |
|
|
|
|
|
|
|
|
| |
subtarget
This is a prerequisite for coming waitcnt changes
Differential Revision: https://reviews.llvm.org/D24939
llvm-svn: 282489
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).
Reviewers: arsenm, b-sumner
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D24865
llvm-svn: 282420
|
| |
|
|
|
|
| |
This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550.
llvm-svn: 282396
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.
Reviewers: vpykhtin, artem.tamazov, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D24802
llvm-svn: 282394
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D24875
llvm-svn: 282296
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D24738
llvm-svn: 282234
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D24835
llvm-svn: 282223
|
| |
|
|
|
|
|
|
|
| |
Also added range checking for DPP attributes.
Assembler tests added as well.
Differential Revision: https://reviews.llvm.org/D24755
llvm-svn: 282145
|
| |
|
|
|
|
|
|
|
| |
Lit tests added.
Resolves https://github.com/RadeonOpenCompute/hcc/issues/122.
Differential Revision: https://reviews.llvm.org/D24765
llvm-svn: 282086
|
| |
|
|
|
|
|
|
| |
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).
llvm-svn: 282068
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: It is replaced by AMDGPUELFObjectWriter
Reviewers: tstellarAMD, vpykhtin, artem.tamazov
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl
Differential Revision: https://reviews.llvm.org/D24654
llvm-svn: 282065
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D24664
llvm-svn: 281965
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D24546
llvm-svn: 281903
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In case s_branch instruction target is itself backend should emit offset -1 but instead it emit 0.
'''
label:
s_branch label // should emit [0xff,0xff,0x82,0xbf]
'''
Tom, Matt: why are we adjusting fixup values in applyFixup() method instead of processFixup()? processFixup() is calling adjustFixupValue() but does nothing with its result.
Reviewers: vpykhtin, artem.tamazov, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl
Differential Revision: https://reviews.llvm.org/D24671
llvm-svn: 281896
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.
The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.
llvm-svn: 281824
|