| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
We were computing the loop exit value, but not ensuring the addrec belonged to the loop whose exit value we were computing. I couldn't actually trip this; the test case shows the basic setup which *might* trip this, but none of the variations I've tried actually do.
llvm-svn: 369730
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The alignment is calculated incorrectly, thus sometimes it doesn't generate aligned mov instructions, as shown by the example below:
```
// b.cc
typedef long long index;
extern "C" index g_tid;
extern "C" index g_num;
void add3(float* __restrict__ a, float* __restrict__ b, float* __restrict__ c) {
index n = 64*1024;
index m = 16*1024;
index k = 4*1024;
index tid = g_tid;
index num = g_num;
__builtin_assume_aligned(a, 32);
__builtin_assume_aligned(b, 32);
__builtin_assume_aligned(c, 32);
for (index i0=tid*k; i0<m; i0+=num*k)
for (index i1=0; i1<n*m; i1+=m)
for (index i2=0; i2<k; i2++)
c[i1+i0+i2] = b[i0+i2] + a[i1+i0+i2];
}
```
Compile with `clang b.cc -Ofast -march=skylake -mavx2 -S`
```
vmovaps -224(%rdi,%rbx,4), %ymm0
vmovups -192(%rdi,%rbx,4), %ymm1 # should be movaps
vmovups -160(%rdi,%rbx,4), %ymm2 # should be movaps
vmovups -128(%rdi,%rbx,4), %ymm3 # should be movaps
vaddps -224(%rsi,%rbx,4), %ymm0, %ymm0
vaddps -192(%rsi,%rbx,4), %ymm1, %ymm1
vaddps -160(%rsi,%rbx,4), %ymm2, %ymm2
vaddps -128(%rsi,%rbx,4), %ymm3, %ymm3
vmovaps %ymm0, -224(%rdx,%rbx,4)
vmovups %ymm1, -192(%rdx,%rbx,4) # should be movaps
vmovups %ymm2, -160(%rdx,%rbx,4) # should be movaps
vmovups %ymm3, -128(%rdx,%rbx,4) # should be movaps
```
Differential Revision: https://reviews.llvm.org/D66575
Patch by Dun Liang
llvm-svn: 369723
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One problem with untagging memory in landing pads is that it only works
correctly if the function that catches the exception is instrumented.
If the function is uninstrumented, we have no opportunity to untag the
memory.
To address this, replace landing pad instrumentation with personality function
wrapping. Each function with an instrumented stack has its personality function
replaced with a wrapper provided by the runtime. Functions that did not have
a personality function to begin with also get wrappers if they may be unwound
past. As the unwinder calls personality functions during stack unwinding,
the original personality function is called and the function's stack frame is
untagged by the wrapper if the personality function instructs the unwinder
to keep unwinding. If unwinding stops at a landing pad, the function is
still responsible for untagging its stack frame if it resumes unwinding.
The old landing pad mechanism is preserved for compatibility with old runtimes.
Differential Revision: https://reviews.llvm.org/D66377
llvm-svn: 369721
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.
Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.
Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.
As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.
Differential Revision: https://reviews.llvm.org/D66606
llvm-svn: 369697
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: In D65402, I want to get DerefState from AADereferenceable but it was not allowed. This patch moves DerefState definition into Attributor.h and makes AADerefenceable inherit StateWrapper.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66585
llvm-svn: 369653
|
| |
|
|
|
|
|
| |
We must update loop metedata before we moved to parent loop if
it is present.
llvm-svn: 369637
|
| |
|
|
|
|
|
| |
Function can have users that are not instructions, e.g., bitcasts. For
now, we simply give up when we see them.
llvm-svn: 369588
|
| |
|
|
| |
llvm-svn: 369577
|
| |
|
|
| |
llvm-svn: 369576
|
| |
|
|
|
|
|
| |
So far we split the unreachable off and placed a new one, this is not
necessary.
llvm-svn: 369575
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unavailable pred.
Currently we do not properly translate addresses with PHIs if LoadBB !=
LI->getParent(), because PHITranslateAddr expects a direct predecessor as argument,
because it considers all instructions outside of the current block to
not requiring translation.
The amount of cases that trigger this should be very low, as most single
predecessor blocks should be folded into their predecessor by GVN before
we actually start with value numbering. It is still not guaranteed to
happen, so we should do PHI translation along all edges between the
loads' block and the predecessor where we have to place a load.
There are a few test cases showing current limits of the PHI translation, which
could be improved later.
Reviewers: spatel, reames, efriedma, john.brawn
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65020
llvm-svn: 369570
|
| |
|
|
|
|
| |
Noticed while looking at pr43028.
llvm-svn: 369541
|
| |
|
|
|
|
|
|
|
|
|
| |
An intermediate extend is used to widen the narrow operand to the width of
the other (wider) operand. At that point, we have the same logic as the
existing transform that was restricted to folds of equal width zext/sext.
This mostly solves PR42700:
https://bugs.llvm.org/show_bug.cgi?id=42700
llvm-svn: 369519
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For an internal function, if all its call sites are dead, the body of the function is considered dead.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D66155
llvm-svn: 369470
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D66503
llvm-svn: 369468
|
| |
|
|
| |
llvm-svn: 369444
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output.
Roundtrip test for text profile and binary profile is added.
Reviewers: wmi, davidxl, danielcdh
Subscribers: hiraditya, mgrang, llvm-commits, twoh
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66191
llvm-svn: 369440
|
| |
|
|
| |
llvm-svn: 369421
|
| |
|
|
|
|
|
| |
We were creating 2 instructions and relying on a subsequent fold
to invert a not(icmp). Create the final icmp directly instead.
llvm-svn: 369411
|
| |
|
|
|
|
|
|
| |
1. Update function name and stale code comments.
2. Use variable names that are less ambiguous.
3. Move operand checks into the function as early exits.
llvm-svn: 369390
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When the line format is wrong, we may end up accessing out of bound
memory. eg: the test with invalide line will cause assert.
Assertion `idx < size()' failed
The fix is to report fatal when we found mismatched line format.
Reviewers: qcolombet, volkan
Reviewed By: qcolombet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66444
llvm-svn: 369389
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the original integer variant requested in:
https://bugs.llvm.org/show_bug.cgi?id=35607
As noted in the TODO and several similar TODOs around this block,
we could do this in instsimplify, but then it would cost more
because we would be trying to match min/max via ValueTracking
in 2 different places.
There are 4 commuted variants for each of smin/smax/umin/umax
that are not matched here. There are also icmp predicate variants
that are not included in the affected test file because they are
already handled by instsimplify by folding the final icmp to
true/false.
https://rise4fun.com/Alive/3KVc
Name: smax(smax, smin)
%c1 = icmp slt i32 %x, %y
%c2 = icmp slt i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp sgt i32 %max, %min
%r = select i1 %c3, i32 %max, i32 %min
=>
%r = %max
Name: smin(smax, smin)
%c1 = icmp slt i32 %x, %y
%c2 = icmp slt i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp sgt i32 %max, %min
%r = select i1 %c3, i32 %min, i32 %max
=>
%r = %min
Name: umax(umax, umin)
%c1 = icmp ult i32 %x, %y
%c2 = icmp ult i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp ult i32 %min, %max
%r = select i1 %c3, i32 %max, i32 %min
=>
%r = %max
Name: umin(umax, umin)
%c1 = icmp ult i32 %x, %y
%c2 = icmp ult i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp ult i32 %min, %max
%r = select i1 %c3, i32 %min, i32 %max
=>
%r = %min
llvm-svn: 369386
|
| |
|
|
|
|
| |
after r369331
llvm-svn: 369334
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we create the set of abstract attributes initially and then
dealt with the fact hat a lookup could fail, e.g., return a nullptr.
This patch will ensure we always return a valid object from a lookup,
allowing us not only to remove the nullptr checks but also to grow the
set of abstract attributes "in-flight" on-demand.
One can now start from those that have the best chance of improving
performance without the need to specify all they might depend on.
While this introduces some boilerplate, the usage of attributes is much
easier and cleaner now.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66276
llvm-svn: 369331
|
| |
|
|
| |
llvm-svn: 369330
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is analogous to D66128 but for AADereferenceable. We have the logic
concentrated in the floating value updateImpl and we use the combiner
helper classes for arguments and return values.
The regressions will go away with "on-demand" attribute creation.
Improvements are already visible in the existing tests.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66272
llvm-svn: 369329
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
What D66126 did for AAAlign, this patch does for AANonNull. Agian, the
logic becomes more concise and localized. Again, returned poiners are
not annotated properly but that will not be an issue if this lands with
the "on-demand" generation of attributes. First improvements due to the
genericValueTraversal are already visible.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66128
llvm-svn: 369328
|
| |
|
|
|
|
|
|
| |
The clamp operator should not take the known of the given state as the
known is potentially based on assumed information. This also adds TODOs
to guide improvements.
llvm-svn: 369327
|
| |
|
|
|
|
|
|
| |
We can avoid repetitive calls getSameOpcode() for already known tree elements by keeping MainOp and AltOp in TreeEntry.
Differential Revision: https://reviews.llvm.org/D64700
llvm-svn: 369315
|
| |
|
|
|
|
|
|
|
| |
This reverts commit b1752f670f3d6393306dd5d37546b6e23384d8a2.
Fixed the issue with a different commit, reapply this one as it was,
afaik, not broken.
llvm-svn: 369303
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Simplify the API using Optional<> and address comments in
https://reviews.llvm.org/D66165
Reviewers: vitalybuka
Subscribers: hiraditya, llvm-commits, ostannard, pcc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66317
llvm-svn: 369300
|
| |
|
|
|
|
|
|
|
| |
This reverts commit cedd0d9a6e4b433e1cd6585d1d4d152eb5e60b11.
Re-apply the original commit but make sure the variables are initialized
(even if they are not used) so UBSan is not complaining.
llvm-svn: 369294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When inserting uses from outside the MemorySSA creation, we don't
normally need to rename uses, based on the assumption that there will be
no inserted Phis (if Def existed that required a Phi, that Phi already
exists). However, when dealing with unreachable blocks, MemorySSA will
optimize away Phis whose incoming blocks are unreachable, and these Phis end
up being re-added when inserting a Use.
There are two potential solutions here:
1. Analyze the inserted Phis and clean them up if they are unneeded
(current method for cleaning up trivial phis does not cover this)
2. Leave the Phi in place and rename uses, the same way as whe inserting
defs.
This patch use approach 2.
Resolves first test in PR42940.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66033
llvm-svn: 369291
|
| |
|
|
| |
llvm-svn: 369250
|
| |
|
|
|
|
|
|
| |
This reverts r369159 (git commit cbaf1fdea2de891bdbc49cdec89ae2077e6b9ed0)
r369160 caused a test to fail under UBSAN. See thread on llvm-commits.
llvm-svn: 369241
|
| |
|
|
|
|
|
|
| |
This reverts r369160 (git commit f72d9b1c97b41fff48ad1eecbba59a29c171bff4)
r369160 caused some tests to fail under UBSAN. See thread on llvm-commits.
llvm-svn: 369236
|
| |
|
|
|
|
| |
foldShiftIntoShiftInAnotherHandOfAndInICmp() from D66383
llvm-svn: 369207
|
| |
|
|
|
|
|
|
| |
This patch applies only to the new pass manager.
Currently, when MSSA Analysis is available, and pass to each loop pass, it will be preserved by that loop pass.
Hence, mark the analysis preserved based on that condition, vs the current `EnableMSSALoopDependency`. This leaves the global flag to affect only the entry point in the loop pass manager (in FunctionToLoopPassAdaptor).
llvm-svn: 369181
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 5dbb90bfe14ace30224239cac7c61a1422fa5144.
As noted in the post-commit thread for r367891, this can create
a multiply that is lowered to a libcall that may not exist.
We need to improve the backend decomposition for integer multiply
before trying to re-land this (if it's still worthwhile after
doing the backend work).
llvm-svn: 369174
|
| |
|
|
|
|
|
|
| |
This relands r369147 with fixes to unit tests.
https://reviews.llvm.org/D65019
llvm-svn: 369173
|
| |
|
|
|
|
|
|
|
|
| |
By partially resolving returned calls we did not record that they were
not fully resolved which caused odd behavior down the line. We could
also end up with some, but not all, returned values of the callee in the
returned values map of the caller, another odd behavior we want to
avoid.
llvm-svn: 369160
|
| |
|
|
|
|
|
| |
The flag was updated *before* we actually run the visitor callback so we
might miss updates.
llvm-svn: 369159
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a preparation to "on-demand" abstract attribute generation we need
implementations for all attributes (as they can be queried and then
created on-demand where we now fail to find one).
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66129
llvm-svn: 369155
|
| |
|
|
|
|
| |
This reverts commit f4cf3b959333f62b7a7b2d7771f7010c9d8da388.
llvm-svn: 369149
|
| |
|
|
|
|
|
|
|
| |
Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of
the stack on ARM32.
Differential Revision: https://reviews.llvm.org/D65019
llvm-svn: 369147
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is the first commit aiming to structure the attribute deduction.
The base idea is that we have default propagation patterns as listed
below on top of which we can add specific, e.g., context sensitive,
logic.
Deduction patterns used in this patch:
- argument states are determined from call site argument states,
see AAAlignArgument and AAArgumentFromCallSiteArguments.
- call site argument states are determined as if they were floating
values, see AAAlignCallSiteArgument and AAAlignFloating.
- floating value states are determined by traversing the def-use chain
and combining the states determined for the leaves, see
AAAlignFloating and genericValueTraversal.
- call site return states are determined from function return states,
see AAAlignCallSiteReturned and AACallSiteReturnedFromReturned.
- function return states are determined from returned value states,
see AAAlignReturned and AAReturnedFromReturnedValues.
Through this strategy all logic for alignment is concentrated in the
AAAlignFloating::updateImpl method.
Note: This commit works on its own but is part of a larger change that
involves "on-demand" creation of abstract attributes that will
participate in the fixpoint iteration. Without this part, we sometimes
do not have an AAAlign abstract attribute to query, loosing information
we determined before. All tests have appropriate FIXMEs and the
information will be recovered once we added all parts.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66126
llvm-svn: 369144
|
| |
|
|
|
|
|
|
|
| |
Until we have call site specific liveness and/or value information there
is no need to do call site specific deduction. Though, we need the
symbols in follow up patches that make Attributor::getAAFor return a
reference.
llvm-svn: 369143
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch should not change the behavior except that the added
initialize methods might indicate an optimistic fixpoint earlier. The
code movement is done to keep the attribute definitions in a single
block where it makes sense. No functional changes intended there.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66258
llvm-svn: 369142
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755:
https://bugs.llvm.org/show_bug.cgi?id=42755
...but we should handle this pattern to make things easier for the backend either way.
For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change
to a vector select, so this is safe to do without a cost model (in other words, as a target-independent
canonicalization).
For example, if the condition of the select is a scalar, we end up with something like this on x86:
vpcmpgtd %xmm0, %xmm1, %xmm0
vpextrb $12, %xmm0, %eax
testb $1, %al
jne LBB0_2
## %bb.1:
vmovaps %xmm3, %xmm2
LBB0_2:
vmovaps %xmm2, %xmm0
Rather than the splat-condition variant:
vpcmpgtd %xmm0, %xmm1, %xmm0
vpshufd $255, %xmm0, %xmm0 ## xmm0 = xmm0[3,3,3,3]
vblendvps %xmm0, %xmm2, %xmm3, %xmm0
Differential Revision: https://reviews.llvm.org/D66095
llvm-svn: 369140
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The scheduler's dependence graph gets the use-def dependencies by accessing the operands of the instructions in a bundle. However, buildTree_rec() may change the order of the operands in TreeEntry, and the scheduler is currently not aware of this. This is not causing any functional issues currently, because reordering is restricted to the operands of a single instruction. Once we support operand reordering across multiple TreeEntries, as shown here: http://www.llvm.org/devmtg/2019-04/slides/Poster-Porpodas-Supernode_SLP.pdf , the scheduler will need to get the correct operands from TreeEntry and not from the individual instructions.
In short, this patch:
- Connects the scheduler's bundle with the corresponding TreeEntry. It introduces new TE and Lane fields in ScheduleData.
- Moves the location where the operands of the TreeEntry are initialized. This used to take place in newTreeEntry() setting one operand at a time, but is now moved pre-order just before the recursion of buildTree_rec(). This is required because the scheduler needs to access both operands of the TreeEntry in tryScheduleBundle().
- Updates the scheduler to access the instruction operands through the TreeEntry operands instead of accessing the instruction operands directly.
Reviewers: ABataev, RKSimon, dtemirbulatov, Ayal, dorit, hfinkel
Reviewed By: ABataev
Subscribers: hiraditya, llvm-commits, lebedev.ri, rcorcs
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62432
llvm-svn: 369131
|