| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 319772
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Found out, at code inspection, that there was a fault in
DAGCombiner::CombineConsecutiveLoads for big-endian targets.
A BUILD_PAIR is always having the least significant bits of
the composite value in element 0. So when we are doing the checks
for consecutive loads, for big endian targets, we should check
if the load to elt 1 is at the lower address and the load
to elt 0 is at the higher address.
Normally this bug only resulted in missed oppurtunities for
doing the load combine. I guess that in some rare situation it
could lead to faulty combines, but I've not seen that happen.
Note that this patch actually will trigger load combine for
some big endian regression tests.
One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get
t76: i64,ch = load<LD8[FixedStack-9]
instead of
t37: i32,ch = load<LD4[FixedStack-10]>
t35: i32,ch = load<LD4[FixedStack-9]>
t41: i64 = build_pair t37, t35
before legalization. Then the legalization will split the LD8
into two loads, so the end result is the same. That should
verify that the transfomation is correct now.
Reviewers: niravd, hfinkel
Reviewed By: niravd
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D40444
llvm-svn: 319771
|
|
|
|
| |
llvm-svn: 319770
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.
The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.
Patch by JesperAntonsson.
Reviewers: spatel, eeckstein, hans
Reviewed By: hans
Subscribers: uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D40639
llvm-svn: 319768
|
|
|
|
| |
llvm-svn: 319767
|
|
|
|
|
|
|
|
|
| |
Pull the checks upon the load out from ReduceLoadWidth into their own
function.
Differential Revision: https://reviews.llvm.org/D40833
llvm-svn: 319766
|
|
|
|
| |
llvm-svn: 319765
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D40649
llvm-svn: 319761
|
|
|
|
|
|
| |
Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now.
llvm-svn: 319760
|
|
|
|
|
|
| |
Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now.
llvm-svn: 319758
|
|
|
|
| |
llvm-svn: 319757
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has proven a healthy exercise, as many cases of incorrect instruction
flags were corrected in the process. As part of this, IntrWriteMem was added
to several SystemZ instrinsics.
Furthermore, a bug was exposed in TwoAddress with this change (as incorrect
hasSideEffects flags were removed and instructions could now be sunk), and
the test case for that bugfix (r319646) is included here as
test/CodeGen/SystemZ/twoaddr-sink.ll.
One temporary test regression (one extra copy) which will hopefully go away
in upcoming patches for similar cases:
test/CodeGen/SystemZ/vec-trunc-to-i1.ll
Review: Ulrich Weigand.
https://reviews.llvm.org/D40437
llvm-svn: 319756
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineRegisterInfo used to allow just one regalloc hint per virtual
register. This patch extends this to a vector of regalloc hints, which is
filled in by common code with sorted copy hints. Such hints will make for
more ID copies that can be removed.
NB! This improvement is currently (and hopefully temporarily) *disabled* by
default, except for SystemZ. The only reason for this is the big impact this
has on tests, which has unfortunately proven unmanageable. It was a long
while since all the tests were updated and just waiting for review (which
didn't happen), but now targets have to enable this themselves
instead. Several targets could get a head-start by downloading the tests
updates from the Phabricator review. Thanks to those who helped, and sorry
you now have to do this step yourselves.
This should be an improvement generally for any target!
The target may still create its own hint, in which case this has highest
priority and is stored first in the vector. If it has target-type, it will
not be recomputed, as per the previous behaviour.
The temporary hook enableMultipleCopyHints() will be removed as soon as all
targets return true.
Review: Quentin Colombet, Ulrich Weigand.
https://reviews.llvm.org/D38128
llvm-svn: 319754
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This recommits r319533 which was broken llvm-config --system-libs
output. The reason was that I used find_libraries for searching for the
z library. This returns absolute paths, and when these paths made it
into llvm-config, it made it produce nonsensical flags. To fix this, I
hand-roll a search for the library in the same way that we search for
the terminfo library a couple of lines below.
This is a bit less flexible than the find_library option, as it does not
allow the user to specify the path to the library at configure time
(which is important on windows, as zlib is unlikely to be found in any
of the standard places cmake searches), but I was able to guide the
build to find it with appropriate values of LIB and INCLUDE environment
variables.
Reviewers: compnerd, rnk, beanz, rafael
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D40779
llvm-svn: 319751
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is for PR35460.
Currently when LLD adds files to TarWriter it may pass the same file
multiple times. For example it happens for clang reproduce file which specifies
archive (.a) files more than once in command line.
Patch makes TarWriter to ignore files with the same path, so it will
add only the first one to archive.
Differential revision: https://reviews.llvm.org/D40606
llvm-svn: 319750
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to determine the correct Mask register class corresponding
to a GPR register class, not all register classes were handled.
This caused an assertion to be raised on some scenarios.
Differential Revision:
https://reviews.llvm.org/D40290
llvm-svn: 319745
|
|
|
|
|
|
|
|
| |
WidenVecOp_MSTORE instead of implementing it manually and incorrectly.
The CONCAT_VECTORS operand get its type from getSetCCResultType, but if the mask type and the setcc have different scalar sizes this creates an illegal CONCAT_VECTORS operation. The concat type should be 2x the mask type, and then an extend should be added if needed.
llvm-svn: 319744
|
|
|
|
|
|
|
|
|
|
| |
is not 512-bits and vlx is not enabled.
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.
If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.
llvm-svn: 319740
|
|
|
|
|
|
|
|
| |
opcode and enable for AArch64.
Some concerns were raised with the direction. Revert while we discuss it and look into an alternative
llvm-svn: 319739
|
|
|
|
|
|
| |
a v32i8 bitreverse.
llvm-svn: 319737
|
|
|
|
| |
llvm-svn: 319733
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This calls handleMove with a DBG_VALUE instruction,
which isn't tracked by LiveIntervals. I'm not sure
this is the correct place to fix this. The generic
scheduler seems to have more deliberate region
selection that skips dbg_value.
The test is also really hard to reduce. I haven't been able
to figure out what exactly causes this particular case to
try moving the dbg_value.
llvm-svn: 319732
|
|
|
|
|
|
|
|
|
|
| |
is not 512-bits and vlx is not enabled.
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.
If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.
llvm-svn: 319728
|
|
|
|
|
|
| |
This can be efficiently selected by a COPY_TO_REGCLASS without the need for an extra instruction.
llvm-svn: 319726
|
|
|
|
|
|
|
|
|
|
|
|
| |
before calling getConstant. NFCI
The getConstant function can take care of creating the APInt internally.
getZeroVector will take care of using the correct type for the build vector to avoid re-lowering.
The test change here is because execution domain constraints apparently pass through undef inputs of a zeroing xor. So the different ordering of register allocation here caused the dependency to change.
llvm-svn: 319725
|
|
|
|
|
|
|
|
| |
Move the AVX512 code out of LowerAVXExtend. LowerAVXExtend has two callers but one of them pre-checks for AVX-512 so the code is only live from the other caller. So move the AVX-512 checks up to that caller for symmetry.
Move all of the i1 input type code in Lower_AVX512ZeroExend together.
llvm-svn: 319724
|
|
|
|
|
|
|
| |
Consistently use the same parameter names as the names of the affected
fields. This avoids some unintuitive abbreviations like `isSS`.
llvm-svn: 319722
|
|
|
|
|
|
|
|
|
| |
While we cannot skip the whole TwoAddressInstructionPass even for -O0
there are some parts of the pass that are currently skipped at -O0 but
not for optnone. Changing this as there is no reason to have those two
hit different code paths here.
llvm-svn: 319721
|
|
|
|
|
|
|
|
|
| |
Only used by pre-GCN targets
v2: fix predicate setting for FMA_Common
Differential Revision: https://reviews.llvm.org/D40692
llvm-svn: 319712
|
|
|
|
|
|
|
|
|
|
|
| |
It's not implemented.
Passing +fp64-fp16-denormal feature enables fp64 even on asics that don't support it
v2: fix hasFP64 query
Differential Revision: https://reviews.llvm.org/D39931
llvm-svn: 319709
|
|
|
|
| |
llvm-svn: 319708
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the stack"
This broke the Chromium build (crbug.com/791714). Reverting while investigating.
> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622
>
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319490 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-svn: 319706
|
|
|
|
|
|
|
|
|
| |
Move the entire optimization to one place. Before it was possible
to adjust dmask without changing the register class of the output
instruction, since they were done in separate places. Fix all
lane sizes and move all of the optimization into the DAG folding.
llvm-svn: 319705
|
|
|
|
| |
llvm-svn: 319704
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set the .debug_line version to match the requested DWARF version,
except with a maximum of v4 because we don't support v5 yet.
Previously Chromium had issues with this patch; see PR31407. Chromium
tool issues have been addressed, so hopefully this will go through
this time.
Patch by Katya Romanova!
Differential Revision: https://reviews.llvm.org/D38002
llvm-svn: 319699
|
|
|
|
|
|
|
|
|
| |
MatchRotate assumes the types of the types of LHS and RHS are equal,
which is always the case then they come from an OR node, but here
we're getting them from two different TRUNC nodes, so we have to check
the types.
llvm-svn: 319695
|
|
|
|
|
|
|
|
|
| |
If the truncation has been pushed past the or-node, look through it and
truncate afterwards.
Differential revision: https://reviews.llvm.org/D40792
llvm-svn: 319692
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enable for AArch64.
This patch splits atomics out of the generic G_LOAD/G_STORE and into their own
G_ATOMIC_LOAD/G_ATOMIC_STORE. This is a pragmatic decision rather than a
necessary one. Atomic load/store has little in implementation in common with
non-atomic load/store. They tend to be handled very differently throughout the
backend. It also has the nice side-effect of slightly improving the common-case
performance at ISel since there's no longer a need for an atomicity check in the
matcher table.
All targets have been updated to remove the atomic load/store check from the
G_LOAD/G_STORE path. AArch64 has also been updated to mark
G_ATOMIC_LOAD/G_ATOMIC_STORE legal.
There is one issue with this patch though which also affects the extending loads
and truncating stores. The rules only match when an appropriate G_ANYEXT is
present in the MIR. For example,
(G_ATOMIC_STORE (G_TRUNC:s16 (G_ANYEXT:s32 (G_ATOMIC_LOAD:s16 X))))
will match but:
(G_ATOMIC_STORE (G_ATOMIC_LOAD:s16 X))
will not. This shouldn't be a problem at the moment, but as we get better at
eliminating extends/truncates we'll likely start failing to match in some
cases. The current plan is to fix this in a patch that changes the
representation of extending-load/truncating-store to allow the MMO to describe
a different type to the operation.
llvm-svn: 319691
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so
that it can be called from other places.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40750
llvm-svn: 319689
|
|
|
|
|
|
|
|
|
|
|
| |
preceding dimensions
Follow-up of r316824. This patch supports the vector type for both current and
previous index when factoring out the current one into the previous one.
Differential Revision: https://reviews.llvm.org/D39556
llvm-svn: 319683
|
|
|
|
|
|
| |
Suggested by Max Kazantsev in https://reviews.llvm.org/D39361
llvm-svn: 319679
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I don't think rL309080 is the right fix for PR33494 -- caching ExitLimit only
hides the problem[0]. The real issue is that because of how we forget SCEV
expressions ScalarEvolution::getBackedgeTakenInfo, in the test case for PR33494
computing the backedge for any loop invalidates the trip count for every other
loop. This effectively makes the SCEV cache useless.
I've instead made the SCEV expression invalidation in
ScalarEvolution::getBackedgeTakenInfo less aggressive to fix this issue.
[0]: One way to think about this is that rL309080 essentially augmented the
backedge-taken-count cache with another equivalent exit-limit cache. The bug
went away because we were explicitly not clearing the exit-limit cache in
getBackedgeTakenInfo. But instead of doing all of that, we can just avoid
clearing the backedge-taken-count cache.
Reviewers: mkazantsev, mzolotukhin
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D39361
llvm-svn: 319678
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(This reapplies r314253. r314253 was reverted on r314482 because of a
correctness regression on P100, but that regression was identified to be
something else.)
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Reviewers: jlebar
Subscribers: jholewinski, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38265
llvm-svn: 319677
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D40756
llvm-svn: 319674
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The compiler fails with the following error message:
fatal error: error in backend: ran out of registers during
register allocation
Tail call optimization for Armv8-M.base fails to meet all the required
constraints when handling calls to function pointers where the
arguments take up r0-r3. This is because the pointer to the
function to be called can only be stored in r0-r3, but these are
all occupied by arguments. This patch makes sure that tail call
optimization does not try to handle this type of calls.
Reviewers: chill, MatzeB, olista01, rengolin, efriedma
Reviewed By: olista01, efriedma
Subscribers: efriedma, aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40706
llvm-svn: 319664
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r319533 as it broke llvm-config --system-libs output
and everything that depends on it (which is mostly out of tree or
downstream folks, but includes a couple of llvm buildbots as well).
I think I have a fix for this in D40779, but I want someone to look
review it first. In the mean time, I am reverting this change, as it
seems to break a lot of people.
llvm-svn: 319663
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reviewers: arsenm, vpykhtin, rampitec
Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D37817
llvm-svn: 319662
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, we only support predication for forward loops with step
of 1. This patch enables loop predication for reverse or
countdownLoops, which satisfy the following conditions:
1. The step of the IV is -1.
2. The loop has a singe latch as B(X) = X <pred>
latchLimit with pred as s> or u>
3. The IV of the guard is the decrement
IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit).
This patch was downstream for a while and is the last series of patches
that's from our LP implementation downstream.
Reviewers: apilipenko, mkazantsev, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40353
llvm-svn: 319659
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The
existing pass already ensured this for globals and this patch adds
the cleanup for functions with local linkage.
However, there was a different problem in the case of collisions
of the adjusted name: The ValueSymbolTable then automatically
appended ".N" with increasing Ns to get a unique name while helping
the ABI demangling. Special case this behavior to omit the dots and
append N directly. This will always give us legal names according
to the PTX requirements.
Differential Revision: https://reviews.llvm.org/D40573
llvm-svn: 319657
|