| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
| |
It is cleaner to have a callback based system where the logic of
whether an add recurrence is normalized or not lives on IVUsers.
This is one step in a multi-step cleanup.
llvm-svn: 300330
|
| |
|
|
| |
llvm-svn: 300329
|
| |
|
|
|
|
|
|
|
|
| |
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
Clang companion patch: D31766.
Differential Revision: https://reviews.llvm.org/D31767
llvm-svn: 300325
|
| |
|
|
| |
llvm-svn: 300321
|
| |
|
|
|
|
|
|
|
|
| |
Fixed bug 32551: https://bugs.llvm.org//show_bug.cgi?id=32551
Reviewers: vpykhtin
Differential Revision: https://reviews.llvm.org/D31809
llvm-svn: 300319
|
| |
|
|
|
|
|
|
|
|
| |
Fixed bug 32619: https://bugs.llvm.org//show_bug.cgi?id=32619
Reviewers: artem.tamazov, vpykhtin
Differential Revision: https://reviews.llvm.org/D31973
llvm-svn: 300318
|
| |
|
|
|
|
| |
Patch by Dinar Temirbulatov
llvm-svn: 300314
|
| |
|
|
|
|
|
|
| |
latencies/throughputs.
The details are here: https://reviews.llvm.org/D30941
llvm-svn: 300311
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is part of D28975's breakdown - no change in output intended.
LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.
Differential Revision: https://reviews.llvm.org/D32040
llvm-svn: 300310
|
| |
|
|
|
|
|
|
| |
a temporary APInt to count leading zeros on.
The APInt was created from an 'unsigned' and we just wanted to know how many bits the value needed to represent it. We can just use Log2_32 from MathExtras.h to get the info.
llvm-svn: 300309
|
| |
|
|
| |
llvm-svn: 300308
|
| |
|
|
| |
llvm-svn: 300307
|
| |
|
|
| |
llvm-svn: 300305
|
| |
|
|
| |
llvm-svn: 300302
|
| |
|
|
|
|
|
|
|
|
| |
Start using it in LLD to avoid needing to read bitcode again just to get the
target triple, and in llvm-lto2 to avoid printing symbol table information
that is inappropriate for the target.
Differential Revision: https://reviews.llvm.org/D32038
llvm-svn: 300300
|
| |
|
|
|
|
|
|
| |
have >1 value, unless we can prove the phi node is cycle free.
Fixes PR 32607.
llvm-svn: 300299
|
| |
|
|
| |
llvm-svn: 300292
|
| |
|
|
| |
llvm-svn: 300291
|
| |
|
|
| |
llvm-svn: 300289
|
| |
|
|
|
|
|
|
| |
Addressed rest of post submit comments from D31993.
Differential Revision: https://reviews.llvm.org/D32057
llvm-svn: 300288
|
| |
|
|
|
|
|
|
|
|
|
| |
Now that we have a type that can represent the attributes on a single
return, function, or parameter, we can pass it around directly rather
than passing around AttributeList and Idx. Removes some more one-based
argument attribute index counting.
NFC
llvm-svn: 300285
|
| |
|
|
|
|
|
|
| |
PR/32584
Differential Revision: https://reviews.llvm.org/D32023
llvm-svn: 300277
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This further improves Ahmed's change in rL299482. See the new comment for the
rationale.
The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.
Differential Revision: https://reviews.llvm.org/D32028
llvm-svn: 300276
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D31819
llvm-svn: 300275
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.
The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.
NFC
llvm-svn: 300272
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the offset cannot fit into the instruction, an addition to the
pointer is emitted before the actual access. However, BPF offsets are
16-bit but LLVM considers them to be, for the matter of this check,
to be 32-bit long.
This causes the following program:
int bpf_prog1(void *ign)
{
volatile unsigned long t = 0x8983984739ull;
return *(unsigned long *)((0xffffffff8fff0002ull) + t);
}
To generate the following (wrong) code:
0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00
r1 = 590618314553ll
2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
3: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 8)
4: 79 10 02 00 00 00 00 00 r0 = *(u64 *)(r1 + 2)
5: 95 00 00 00 00 00 00 00 exit
Fix it by changing the offset check to 16-bit.
Patch by Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Differential Revision: https://reviews.llvm.org/D32055
llvm-svn: 300269
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ErrorOr should not be dereferenced on the error path.
Patch by Jacob Young
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32032
llvm-svn: 300267
|
| |
|
|
|
|
| |
getLowBitsSet. NFC
llvm-svn: 300265
|
| |
|
|
| |
llvm-svn: 300258
|
| |
|
|
|
|
|
|
| |
of Select.
We call it unconditionally on the operands of the select. Then decide if its a min/max and call it on the min/max operands or on the select operands again. Either of those second calls will overwrite the results of the initial call so we can just delete the first call.
llvm-svn: 300256
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For LCSSA purposes, loop BBs not dominating any of the exits aren't
interesting, as none of the values defined in these blocks can be
used outside the loop.
The way the code computed this information was by comparing each
BB of the loop with each of the exit blocks and ask the dominator tree
about their dominance relation. This is slow.
A more efficient way, implemented here, is that of starting from the
exit blocks and walking the dom upwards until we hit an header. By
transitivity, all the blocks we encounter in our path dominate an exit.
For the testcase provided in PR31851, this reduces compile time on
`opt -O2` by ~25%, going from 1m47s to 1m22s.
Thanks to Dan/MichaelZ for discussions/suggesting the approach/review.
Differential Revision: https://reviews.llvm.org/D31843
llvm-svn: 300255
|
| |
|
|
| |
llvm-svn: 300254
|
| |
|
|
| |
llvm-svn: 300253
|
| |
|
|
|
|
|
|
|
|
|
| |
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.
Differential Revision: https://reviews.llvm.org/D31968
llvm-svn: 300252
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Bug noticed by inspection.
Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.
Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32041
llvm-svn: 300251
|
| |
|
|
|
|
|
| |
We could otherwise add BBs not belonging to a loop in `formLCSSA`
and later crash when trying to iterate the loop blocks.
llvm-svn: 300244
|
| |
|
|
| |
llvm-svn: 300243
|
| |
|
|
| |
llvm-svn: 300242
|
| |
|
|
| |
llvm-svn: 300241
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
callsite_location+callee_name
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.
Reviewers: davidxl, dnovillo
Reviewed By: davidxl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31950
llvm-svn: 300240
|
| |
|
|
|
|
| |
state of the bit we would calculate. Also reuse a temporary APInt instead of creating a new one.
llvm-svn: 300239
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.
Reviewers: mssimpso, mkuper, anemet
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31979
llvm-svn: 300238
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.
I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 300236
|
| |
|
|
| |
llvm-svn: 300235
|
| |
|
|
| |
llvm-svn: 300233
|
| |
|
|
| |
llvm-svn: 300230
|
| |
|
|
|
|
|
|
|
|
| |
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.
Add a test case for this.
llvm-svn: 300229
|
| |
|
|
|
|
|
|
|
| |
In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.
Differential Revision: https://reviews.llvm.org/D31993
llvm-svn: 300227
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's less efficient to produce 'ule' than 'ult' since we know we're going to
canonicalize to 'ult', but we shouldn't have duplicated code for these folds.
As a trade-off, this was a pretty terrible way to make a '2'. :)
if (LHSC == SubOne(RHSC))
AddC = ConstantExpr::getSub(AddOne(RHSC), LHSC);
The next steps are to share the code to fix PR32524 and add the missing 'and'
fold that was left out when PR14708 was fixed:
https://bugs.llvm.org/show_bug.cgi?id=14708
llvm-svn: 300222
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
high-order bits are eliminated once the bits are reversed
and then right-shifted.
Reviewers: mkuper, jmolloy, hfinkel, trentxintong
Reviewed By: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31857
llvm-svn: 300215
|