| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D23617
llvm-svn: 280101
|
| |
|
|
| |
llvm-svn: 280075
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former is simply wrong -- the code will either never be used or will
always be used, rather than being dependent upon whether it's built with
debug assertions enabled.
The macro DEBUG isn't ever set by the llvm build system. But, the macro
DEBUG(X) is defined (unconditionally) if you happen to include
llvm/Support/Debug.h.
The code in Value.h which was erroneously protected by the #ifdef DEBUG
didn't even compile -- you can't cast<> from an LLVMOpaqueValue
directly. Fortunately, it was never invoked, as Core.cpp included
Value.h before Debug.h.
The conditionalized code in AArch64CollectLOH.cpp was previously always
used, as it includes Debug.h.
llvm-svn: 280056
|
| |
|
|
|
|
|
|
|
| |
Implement Bill's suggested fix for 32-bit targets for PR22711 (for the
alignment of each entry). As pointed out in the bug report, we could just force
the section alignment, since we only add pointer-sized things currently, but
this fix is somewhat more future-proof.
llvm-svn: 280049
|
| |
|
|
|
|
|
|
|
|
|
| |
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.
Fixes PR19098.
llvm-svn: 280040
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverse iterators to doubly-linked lists can be simpler (and cheaper)
than std::reverse_iterator. Make it so.
In particular, change ilist<T>::reverse_iterator so that it is *never*
invalidated unless the node it references is deleted. This matches the
guarantees of ilist<T>::iterator.
(Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a
MachineInstrBundleIterator<MachineInstr>. This commit does not change
MachineBasicBlock::reverse_iterator, but it does update
MachineBasicBlock::reverse_instr_iterator. See note at end of commit
message for details on bundle iterators.)
Given the list (with the Sentinel showing twice for simplicity):
[Sentinel] <-> A <-> B <-> [Sentinel]
the following is now true:
1. begin() represents A.
2. begin() holds the pointer for A.
3. end() represents [Sentinel].
4. end() holds the poitner for [Sentinel].
5. rbegin() represents B.
6. rbegin() holds the pointer for B.
7. rend() represents [Sentinel].
8. rend() holds the pointer for [Sentinel].
The changes are #6 and #8. Here are some properties from the old
scheme (which used std::reverse_iterator):
- rbegin() held the pointer for [Sentinel] and rend() held the pointer
for A;
- operator*() cost two dereferences instead of one;
- converting from a valid iterator to its valid reverse_iterator
involved a confusing increment; and
- "RI++->erase()" left RI invalid. The unintuitive replacement was
"RI->erase(), RE = end()".
With vector-like data structures these properties are hard to avoid
(since past-the-beginning is not a valid pointer), and don't impose a
real cost (since there's still only one dereference, and all iterators
are invalidated on erase). But with lists, this was a poor design.
Specifically, the following code (which obviously works with normal
iterators) now works with ilist::reverse_iterator as well:
for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;)
fooThatMightRemoveArgFromList(*RI++);
Converting between iterator and reverse_iterator for the same node uses
the getReverse() function.
reverse_iterator iterator::getReverse();
iterator reverse_iterator::getReverse();
Why doesn't iterator <=> reverse_iterator conversion use constructors?
In order to catch and update old code, reverse_iterator does not even
have an explicit conversion from iterator. It wouldn't be safe because
there would be no reasonable way to catch all the bugs from the changed
semantic (see the changes at call sites that are part of this patch).
Old code used this API:
std::reverse_iterator::reverse_iterator(iterator);
iterator std::reverse_iterator::base();
Here's how to update from old code to new (that incorporates the
semantic change), assuming I is an ilist<>::iterator and RI is an
ilist<>::reverse_iterator:
[Old] ==> [New]
reverse_iterator(I) (--I).getReverse()
reverse_iterator(I) ++I.getReverse()
--reverse_iterator(I) I.getReverse()
reverse_iterator(++I) I.getReverse()
RI.base() (--RI).getReverse()
RI.base() ++RI.getReverse()
--RI.base() RI.getReverse()
(++RI).base() RI.getReverse()
delete &*RI, RE = end() delete &*RI++
RI->erase(), RE = end() RI++->erase()
=======================================
Note: bundle iterators are out of scope
=======================================
MachineBasicBlock::iterator, also known as
MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent
MachineInstr bundles. The idea is that each operator++ takes you to the
beginning of the next bundle. Implementing a sane reverse iterator for
this is harder than ilist. Here are the options:
- Use std::reverse_iterator<MBB::i>. Store a handle to the beginning of
the next bundle. A call to operator*() runs a loop (usually
operator--() will be called 1 time, for unbundled instructions).
Increment/decrement just works. This is the status quo.
- Store a handle to the final node in the bundle. A call to operator*()
still runs a loop, but it iterates one time fewer (usually
operator--() will be called 0 times, for unbundled instructions).
Increment/decrement just works.
- Make the ilist_sentinel<MachineInstr> *always* store that it's the
sentinel (instead of just in asserts mode). Then the bundle iterator
can sniff the sentinel bit in operator++().
I initially tried implementing the end() option as part of this commit,
but updating iterator/reverse_iterator conversion call sites was
error-prone. I have a WIP series of patches that implements the final
option.
llvm-svn: 280032
|
| |
|
|
|
|
|
|
|
| |
Move SDLoc initialization to comon place.
fall back to AMDGPU version in one place
Differential Revision: https://reviews.llvm.org/D23900
llvm-svn: 280030
|
| |
|
|
| |
llvm-svn: 280025
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).
Fixes PR22923.
llvm-svn: 280022
|
| |
|
|
|
|
|
|
| |
This is handled by DAGCombiner in a more generic way
Differential Revision: https://reviews.llvm.org/D23970
llvm-svn: 280019
|
| |
|
|
| |
llvm-svn: 280006
|
| |
|
|
|
|
| |
Should have been done with r276646.
llvm-svn: 279996
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.
It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.
Shader DB stats:
32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23688
llvm-svn: 279995
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.
Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate. It also gives more consistent results, since it is no longer
affected by minor scheduling changes.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23814
llvm-svn: 279991
|
| |
|
|
| |
llvm-svn: 279988
|
| |
|
|
|
|
|
| |
There should be no functional change here, I'm just making the implementation
of "frem" (to libcall) legalization easier for a followup.
llvm-svn: 279987
|
| |
|
|
|
|
| |
Fixes bug 29289
llvm-svn: 279986
|
| |
|
|
|
|
| |
Further refine the model for loads.
llvm-svn: 279976
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For shrinking SOPK instructions, we were creating a hint to tell the
register allocator to use the register allocated for src0 for the dst
operand as well. However, this seems to not work sometimes depending
on the order virtual registers are assigned physical registers.
To fix this, I've added a second allocation hint which does the reverse,
asks that the register allocated for dst is used for src0.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23862
llvm-svn: 279968
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: A follow-up fix on r279958.
Reviewers: bkramer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23989
llvm-svn: 279964
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The SILoadStoreOptimizer will need to use AliasAnalysis here in order to
move it before scheduling.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23813
llvm-svn: 279963
|
| |
|
|
|
|
|
|
| |
TEST sequence.
Differential Revision: http://reviews.llvm.org/D23490
llvm-svn: 279960
|
| |
|
|
|
|
|
|
| |
create a ConstantFPSDNode and let that be lowered.
This allows broadcast loads to used when available.
llvm-svn: 279958
|
| |
|
|
|
|
| |
integer operations when DQI isn't supported. This is consistent with the recent changes to promote logical operations to i64 vectors.
llvm-svn: 279957
|
| |
|
|
| |
llvm-svn: 279951
|
| |
|
|
| |
llvm-svn: 279950
|
| |
|
|
|
|
|
|
|
|
| |
number of vector elements
Over eager combing prevents the correct folding of writemasks.
At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask.
llvm-svn: 279934
|
| |
|
|
|
|
| |
Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.
llvm-svn: 279933
|
| |
|
|
|
|
|
|
|
|
| |
AVX512F/AVX512VL.
Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available.
Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though.
llvm-svn: 279929
|
| |
|
|
|
|
| |
instructions instead of ending 128/256. NFC
llvm-svn: 279927
|
| |
|
|
|
|
|
|
| |
Fix and improve tests
Differential Revision: https://reviews.llvm.org/D23899
llvm-svn: 279925
|
| |
|
|
|
|
| |
REX.X or REX.R bits. It's old name conflicted with a function in X8II namespace that doesnt' quite do the same thing. NFC
llvm-svn: 279924
|
| |
|
|
|
|
| |
already found a register that requires a REX prefix. Otherwise we don't error if a high byte register is used after SPL/BPL/DIL/SIL.
llvm-svn: 279923
|
| |
|
|
|
|
| |
more consistent with its name and simplifies assembler code.
llvm-svn: 279922
|
| |
|
|
|
|
| |
for CR8-CR15.
llvm-svn: 279921
|
| |
|
|
| |
llvm-svn: 279915
|
| |
|
|
|
|
| |
VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts.
llvm-svn: 279914
|
| |
|
|
| |
llvm-svn: 279913
|
| |
|
|
| |
llvm-svn: 279912
|
| |
|
|
|
|
| |
Fixes bug 26800
llvm-svn: 279910
|
| |
|
|
|
|
|
| |
SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec.
SI_IF_BREAK and SI_ELSE_BREAK do not read it either.
llvm-svn: 279909
|
| |
|
|
| |
llvm-svn: 279902
|
| |
|
|
|
|
|
|
| |
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.
llvm-svn: 279901
|
| |
|
|
| |
llvm-svn: 279900
|
| |
|
|
|
|
|
|
|
| |
It isn't used for anything, and is also misleading since
it could be spilled at the end of the block, so it can't be relied
on. There ends up being a verifier error about using an undefined
register since the spill kills the register.
llvm-svn: 279899
|
| |
|
|
|
|
| |
Unfortunately this seems to only help the assembler diagnostic.
llvm-svn: 279895
|
| |
|
|
|
|
|
| |
When doing the ABI lowering, report a failure to the caller instead of
asserting. This gives a chance for the caller to recover.
llvm-svn: 279890
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.
Also, we will want to do this if we move the load/store optimizer before
the scheduler.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23776
llvm-svn: 279870
|
| |
|
|
| |
llvm-svn: 279868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are a few different sgpr pressure sets, but we only care about
the one which covers all of the sgprs. We were using hard-coded
register pressure set names to determine the reg set id for the
biggest sgpr set. However, we were using the wrong name, and this
method is pretty fragile, since the reg pressure set names may
change.
The new method just looks for the pressure set that contains the most
reg units and sets that set as our SGPR pressure set. We've also
adopted the same technique for determining our VGPR pressure set.
Reviewers: arsenm
Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23687
llvm-svn: 279867
|