| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 293793
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make SolveLinEquationWithOverflow take the start as a SCEV, so we can
solve more cases. With that implemented, get rid of the special case
for powers of two.
The additional functionality probably isn't particularly useful,
but it might help a little for certain cases involving pointer
arithmetic.
Differential Revision: https://reviews.llvm.org/D28884
llvm-svn: 293576
|
|
|
|
| |
llvm-svn: 293504
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is fixing pr31761: BasicAA is deducing NoAlias
on the result of the GEP if the base pointer is itself NoAlias.
This is possible only if the NoAlias on the base pointer is
deduced with a non-sized query: this should guarantee that
the pointers are belonging to different memory allocation
and that the GEP can't legally jump from one to another.
Differential Revision: https://reviews.llvm.org/D29216
llvm-svn: 293293
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inlining in getAddExpr() can cause abnormal computational time in some cases.
New parameter -scev-addops-inline-threshold is intruduced with default value 500.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28812
llvm-svn: 293176
|
|
|
|
|
|
|
|
|
|
| |
supports it"
This reverts commit r292680. It is causing significantly worse
performance and test timeouts in our internal builds. I have already
routed reproduction instructions your way.
llvm-svn: 293092
|
|
|
|
|
|
|
|
|
|
|
| |
bots ever since d0k fixed the CHECK lines so that it did something at
all.
It isn't actually testing SCEV directly but LSR, so move it into LSR and
the x86-specific tree of tests that already exists there. Target
dependence is common and unavoidable with the current design of LSR.
llvm-svn: 292774
|
|
|
|
|
|
|
|
|
|
|
|
| |
become unavailable.
The AssumptionCache is now immutable but it still needs to respond to
DomTree invalidation if it ended up caching one.
This lets us remove one of the explicit invalidates of LVI but the
other one continues to avoid hitting a latent bug.
llvm-svn: 292769
|
|
|
|
| |
llvm-svn: 292762
|
|
|
|
|
|
| |
The colon is important.
llvm-svn: 292761
|
|
|
|
|
|
|
|
|
|
| |
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.
This patch fixes pr31492.
Differential Revision: https://reviews.llvm.org/D28630
llvm-svn: 292680
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To avoid regressions, make ScalarEvolution::createSCEV a bit more
clever.
Also get rid of some useless code in ScalarEvolution::howFarToZero
which was hiding this bug.
No new testcase because it's impossible to actually expose this bug:
we don't have any in-tree users of getUDivExactExpr besides the two
functions I just mentioned, and they both dodged the problem. I'll
try to add some interesting users in a followup.
Differential Revision: https://reviews.llvm.org/D28587
llvm-svn: 292449
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
runnig LCSSA over them prior to running the loop pipeline.
This also teaches the loop PM to verify that LCSSA form is preserved
throughout the pipeline's run across the loop nest.
Most of the test updates just leverage this new functionality. One has to be
relaxed with the new PM as IVUsers is less powerful when it sees LCSSA input.
Differential Revision: https://reviews.llvm.org/D28743
llvm-svn: 292241
|
|
|
|
|
|
| |
We already have patterns in place to support 128/256-bit shifts without AVX512VL
llvm-svn: 292077
|
|
|
|
|
|
|
|
| |
costs
Keep the tests though.
llvm-svn: 292076
|
|
|
|
|
|
|
|
| |
non-constant uniform values.
Use shuffle( scslar_to_vector, zeroinitializer) pattern instead of shuffle( vec, zeroinitializer)
llvm-svn: 292075
|
|
|
|
|
|
|
|
|
|
| |
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.
No functionality changed, but it makes subsequent changes cleaner.
llvm-svn: 292060
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mark it as never invalidated in the new PM.
The old PM already required this to work, and after a discussion with
Hal this seems to really be the only sensible answer. The cache
gracefully degrades as the IR is mutated, and most things which do this
should already be incrementally updating the cache.
This gets rid of a bunch of logic preserving and testing the
invalidation of this analysis.
llvm-svn: 292039
|
|
|
|
|
|
| |
has landed
llvm-svn: 292023
|
|
|
|
|
|
|
|
|
| |
Refines max backedge-taken count if a loop like
"for (int i = 0; i != n; ++i) { /* body */ }" is rotated.
Differential Revision: https://reviews.llvm.org/D28536
llvm-svn: 291704
|
|
|
|
|
|
|
|
|
| |
This is both easier to understand, and produces a tighter bound in certain
cases.
Differential Revision: https://reviews.llvm.org/D28393
llvm-svn: 291701
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28447
llvm-svn: 291665
|
|
|
|
| |
llvm-svn: 291663
|
|
|
|
|
|
|
|
|
|
|
|
| |
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original code considered only v2i64 as slow for this feature. This patch
consider all 128-bit long vector types as slow candidates.
In internal tests, extending this feature to all 128-bit vector types
resulted in an overall improvement of 1% on Exynos M1.
Differential revision: https://reviews.llvm.org/D27998
llvm-svn: 291616
|
|
|
|
| |
llvm-svn: 291585
|
|
|
|
| |
llvm-svn: 291468
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
invalid.
This fixes use-after-free bugs that will arise with any interesting use
of SCEV.
I've added a dedicated test that works diligently to trigger these kinds
of bugs in the new pass manager and also checks for them explicitly as
well as triggering ASan failures when things go squirly.
llvm-svn: 291426
|
|
|
|
|
|
|
|
|
|
| |
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).
Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.
Added missing AVX2/AVX512BW costs as well.
llvm-svn: 291391
|
|
|
|
|
|
| |
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.
llvm-svn: 291390
|
|
|
|
|
|
| |
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.
llvm-svn: 291372
|
|
|
|
| |
llvm-svn: 291366
|
|
|
|
|
|
| |
We were matching against general vector shift costs before the uniform splat costs
llvm-svn: 291365
|
|
|
|
| |
llvm-svn: 291354
|
|
|
|
|
|
| |
v64i8 shuffles (PR31470)
llvm-svn: 291347
|
|
|
|
| |
llvm-svn: 291269
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28403
llvm-svn: 291254
|
|
|
|
|
|
| |
Set the costs on the lowest target that supports the type.
llvm-svn: 291229
|
|
|
|
|
|
| |
Added a test demonstrating bug in AVX512 division costs
llvm-svn: 291228
|
|
|
|
|
|
|
|
| |
extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1
llvm-svn: 291149
|
|
|
|
| |
llvm-svn: 291140
|
|
|
|
|
|
|
|
| |
Currently only for broadcasts with input and output of the same width.
Differential Revision: https://reviews.llvm.org/D27811
llvm-svn: 291122
|
|
|
|
| |
llvm-svn: 291117
|
|
|
|
| |
llvm-svn: 291112
|
|
|
|
|
|
|
| |
Without this CHECK line, we may not detect incorrectly detected additional
regions at the end of the region tree.
llvm-svn: 290994
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test case has been reduced from test/Analysis/RegionInfo/mix_1.ll and
provides us with a minimal example of a test case which caused problems while
working on an improved version of the RegionInfo analysis. We upstream this
test case, as it certainly can be helpful in future debugging and optimization
tests.
Test case reduced by Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 290974
|
|
|
|
|
|
| |
Actual codegen is much better than the extract+insert patterns that was assumed.
llvm-svn: 290962
|
|
|
|
|
|
| |
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).
llvm-svn: 290812
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.
In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).
* Shiffle-broadcast cost will be changed in Simon's upcoming patch.
Differential Revision: https://reviews.llvm.org/D28118
llvm-svn: 290810
|
|
|
|
| |
llvm-svn: 290790
|