| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm::SmallVectorImpl& output parameter
Summary:
I do believe this is the correct fix.
We call `rangeQuery()` *very* often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help.
Old: (D54389)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% )
...
7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% )
...
7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% )
```
And another -7%. And that isn't even the good bit yet.
Old:
* calls to allocation functions: 2081419
* temporary allocations: 219658 (10.55%)
* bytes allocated in total (ignoring deallocations): 4.31 GB
New:
* calls to allocation functions: 1880295 (-10%)
* temporary allocations: 18758 (1%) (-91% *sic*)
* bytes allocated in total (ignoring deallocations): 545.15 MB (-88% *sic*)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54390
llvm-svn: 347202
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
std::vector<> with std::deque<> in llvm::SetVector<>
Summary:
Old: (D54388)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% )
...
8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% )
...
7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% )
```
Another -~7%.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, RKSimon
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54389
llvm-svn: 347201
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm::SmallVector<size_t, 0> for storage.
Summary:
Old: (D54383)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% )
...
9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% )
...
8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% )
```
So another -6%.
That is because the `SmallVector` **doubles** it size when reallocating, which is great here,
since we can't `reserve()` since we can't know how many `Neighbors` we will have.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54388
llvm-svn: 347200
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
double each time.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54382)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% )
...
9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% )
```
New time:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% )
...
9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% )
```
-~0.3%, not that much. But this isn't the important part.
Old:
* calls to allocation functions: 2109712
* temporary allocations: 33112
* bytes allocated in total (ignoring deallocations): 4.43 GB
New:
* calls to allocation functions: 2095345 (-0.68%)
* temporary allocations: 18745 (-43.39% !!!)
* bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54383
llvm-svn: 347199
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.487s
user 0m9.745s
sys 0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m9.599s
user 0m8.824s
sys 0m0.772s
```
Not that much, around -9%. But that is not the good part yet, again.
Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB
New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54382
llvm-svn: 347198
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm::SetVector<> instead of ILLEGAL std::unordered_set<>
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s
user 0m24.099s
sys 0m0.785s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.469s
user 0m9.797s
sys 0m0.672s
```
So -60%. And that isn't the good bit yet.
Old:
* calls to allocation functions: 106560180 (yes, 107 *million* allocations.)
* bytes allocated in total (ignoring deallocations): 12.17 GB
New:
* calls to allocation functions: 3347676 (-96.86%) (just 3 mil)
* bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less)
---
Two points i want to raise:
* `std::unordered_set<>` should not have been used there in the first place.
It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options
* There is no tests, so i'm not fully sure this is correct.
Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok?
* I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc,
this `llvm::SetVector<>` seems to be best here.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet
Subscribers: kristina, bobsayshilol, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54381
llvm-svn: 347197
|
| |
|
|
|
|
| |
This commit should fix failing bots.
llvm-svn: 347196
|
| |
|
|
| |
llvm-svn: 347195
|
| |
|
|
| |
llvm-svn: 347194
|
| |
|
|
| |
llvm-svn: 347193
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54225
llvm-svn: 347192
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
This leaves sinks as being:
- points where the value in the register is being observed, such as
an icmp, switch or store.
- points where value types have to match, such as calls and returns.
- zext are included to ease the transformation and are generally
removed later on.
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.
Differential Revision: https://reviews.llvm.org/D54515
llvm-svn: 347191
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.
This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.
Differential Revision: https://reviews.llvm.org/D52827
llvm-svn: 347190
|
| |
|
|
|
|
|
|
|
|
|
| |
Don't deduce address spaces for non-pointer-like types
in template args.
Fixes PR38603!
Differential Revision: https://reviews.llvm.org/D54634
llvm-svn: 347189
|
| |
|
|
| |
llvm-svn: 347188
|
| |
|
|
|
|
|
|
|
|
|
| |
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.
Path by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D54623
llvm-svn: 347187
|
| |
|
|
| |
llvm-svn: 347186
|
| |
|
|
|
|
|
|
| |
sign bit in v4i32 MULH lowering.
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.
llvm-svn: 347185
|
| |
|
|
| |
llvm-svn: 347184
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces infrastructure and the simplest case for constant-folding
of branch and switch instructions within loop into unconditional branches.
It is useful as a cleanup for such passes as loop unswitching that sometimes
produce such branches.
Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.
Differential Revision: https://reviews.llvm.org/D54021
Reviewed By: anna
llvm-svn: 347183
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI(). I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.
Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively. Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB. For example,
Function::getEntryBlock, Loop:getExitBlock, etc.
I also fixed one of the comments.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D54669
llvm-svn: 347182
|
| |
|
|
|
|
|
|
| |
extending to v2i64 pre-sse4.1
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.
llvm-svn: 347181
|
| |
|
|
|
|
|
|
| |
-x86-experimental-vector-widening-legalization.
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.
llvm-svn: 347180
|
| |
|
|
|
|
| |
OpenBSD/powerpc only supports Secure PLT.
llvm-svn: 347179
|
| |
|
|
| |
llvm-svn: 347178
|
| |
|
|
|
|
| |
conversions.
llvm-svn: 347177
|
| |
|
|
|
|
|
|
| |
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.
When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.
llvm-svn: 347176
|
| |
|
|
|
|
|
|
| |
vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support.
Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend.
llvm-svn: 347175
|
| |
|
|
|
|
|
| |
This breaks many tests on Windows, which now all fail with an error such
as "Unable to read memory at address <xxxxxxxx>".
llvm-svn: 347174
|
| |
|
|
|
|
| |
SSE vector shifts only use the bottom 64-bits of the shift amount vector.
llvm-svn: 347173
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
-x86-experimental-vector-widening-legalization. Add custom type legalization for extends.
If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.
This patch disables combineToExtendVectorInReg when we are using widening.
I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.
I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.
llvm-svn: 347172
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extract_subvector, and a packuswb instruction.
Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54671
llvm-svn: 347171
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code,
but copying the folds seems to be the standard method to ensure
that we don't miss these folds.
Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no
way to ensure that we do these kinds of simplifications unless the
code is repeated at node creation time and during combines.
There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167
I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.
This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023
llvm-svn: 347170
|
| |
|
|
| |
llvm-svn: 347169
|
| |
|
|
|
|
|
|
| |
Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
IsSplatVector returns the original vector source of the splat and the splat index.
GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.
llvm-svn: 347168
|
| |
|
|
| |
llvm-svn: 347167
|
| |
|
|
| |
llvm-svn: 347166
|
| |
|
|
| |
llvm-svn: 347165
|
| |
|
|
| |
llvm-svn: 347164
|
| |
|
|
|
|
|
|
| |
This check removes unneeded scaling of arguments when calling Abseil Time factory functions.
Patch by Hyrum Wright.
llvm-svn: 347163
|
| |
|
|
|
|
| |
Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.
llvm-svn: 347162
|
| |
|
|
| |
llvm-svn: 347161
|
| |
|
|
| |
llvm-svn: 347160
|
| |
|
|
| |
llvm-svn: 347159
|
| |
|
|
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode (PR39549)
We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.
llvm-svn: 347158
|
| |
|
|
|
|
|
|
|
| |
CheckerOptInfo feels very much out of place in CheckerRegistration.cpp, so I
moved it to CheckerRegistry.h.
Differential Revision: https://reviews.llvm.org/D54397
llvm-svn: 347157
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
NetBSD ships with native curses(3) and -ltinfo is a part of ncurses.
Set -lterminfo before -ltinfo, as it allows to prioritize native curses
libraries. Mixing curses and ncurses does not work well, especially
in software built on top of llvm.
Original patch by Ryo Onodera (NetBSD) in pkgsrc.
Reviewers: labath, dim, mgorny
Reviewed By: dim, mgorny
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54650
llvm-svn: 347156
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Now `llc -filetype=null` works.
Reviewers: eush
Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54660
llvm-svn: 347155
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This was missing in D54096. Independent tests for this is not available
here, because these are used in lld.
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54662
llvm-svn: 347154
|
| |
|
|
|
|
|
|
|
|
| |
Especially with pointees, a lot of meaningless reports came from uninitialized
regions that were already reported. This is fixed by storing all reported fields
to the GDM.
Differential Revision: https://reviews.llvm.org/D51531
llvm-svn: 347153
|