| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
This reverts commit r295084. There is a test failure on:
http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/
llvm-svn: 295092
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The profile name variables passed to counter increment intrinsics are
dead after we emit the finalized name data in __llvm_prf_nm. However, we
neglect to erase these name variables. This causes huge size increases
in the __TEXT,__const section as well as slowdowns when linker dead
stripping is disabled. Some affected projects are so massive that they
fail to link on Darwin, because only the small code model is supported.
Fix the issue by throwing away the name constants as soon as we're done
with them.
Differential Revision: https://reviews.llvm.org/D29921
llvm-svn: 295084
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.
If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.
Reviewers: chandlerc, eraman
Reviewed By: eraman
Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D29169
llvm-svn: 295075
|
| |
|
|
| |
llvm-svn: 295066
|
| |
|
|
|
|
|
|
|
|
|
| |
This reapplies commit r294967 with a fix for the execution time regressions
caught by the clang-cmake-aarch64-quick bot. We now extend the truncate
optimization to non-primary induction variables only if the truncate isn't
already free.
Differential Revision: https://reviews.llvm.org/D29847
llvm-svn: 295063
|
| |
|
|
|
|
| |
Add support for specifying an UNPCK input as UNDEF
llvm-svn: 295061
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29759
llvm-svn: 295060
|
| |
|
|
|
|
| |
Not correctly using UNDEF or ZERO inputs to combine to UNPCK shuffles
llvm-svn: 295059
|
| |
|
|
|
|
| |
Remove excess semicolons
llvm-svn: 295058
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
back into a vector
Previously the cost of the existing ExtractElement/ExtractValue
instructions was considered as a dead cost only if it was detected that
they have only one use. But these instructions may be considered
dead also if users of the instructions are also going to be vectorized,
like:
```
%x0 = extractelement <2 x float> %x, i32 0
%x1 = extractelement <2 x float> %x, i32 1
%x0x0 = fmul float %x0, %x0
%x1x1 = fmul float %x1, %x1
%add = fadd float %x0x0, %x1x1
```
This can be transformed to
```
%1 = fmul <2 x float> %x, %x
%2 = extractelement <2 x float> %1, i32 0
%3 = extractelement <2 x float> %1, i32 1
%add = fadd float %2, %3
```
because though `%x0` and `%x1` have 2 users each other, these users are
part of the vectorized tree and we can consider these `extractelement`
instructions as dead.
Differential Revision: https://reviews.llvm.org/D29900
llvm-svn: 295056
|
| |
|
|
|
|
| |
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.
llvm-svn: 295054
|
| |
|
|
| |
llvm-svn: 295050
|
| |
|
|
|
|
|
|
| |
opportunity.
It also shows an unnecessary pshufb/broadcast being used - the original pshufb mask only requested the lowest byte.
llvm-svn: 295046
|
| |
|
|
|
|
|
|
|
| |
accesses"
This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed.
I'm reverting to investigate.
llvm-svn: 295042
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.
This is fixing pr31900.
Reviewers: HaoLiu, mssimpso, mkuper
Reviewed By: mssimpso, mkuper
Subscribers: llvm-commits, efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29717
llvm-svn: 295038
|
| |
|
|
| |
llvm-svn: 295035
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Function isCompatibleIVType is already used as a guard before the call to
SE.getMinusSCEV(OperExpr, PrevExpr);
in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.
Reviewers: qcolombet, eli.friedman
Reviewed By: qcolombet
Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29885
llvm-svn: 295033
|
| |
|
|
|
|
|
|
| |
functions to merged module.
Differential Revision: https://reviews.llvm.org/D29701
llvm-svn: 295021
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.
Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.
It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {
load i128 ...
if (can lower i128 atomic)
store atomic i128 ...
else
store i128
}
It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically.
It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result.
Differential Revision: https://reviews.llvm.org/D15592
llvm-svn: 295015
|
| |
|
|
|
|
|
|
|
|
| |
This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions.
Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now.
Differential Revision: https://reviews.llvm.org/D29903
llvm-svn: 295004
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to its parent function
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html
...we should be able to propagate 'nonnull' info from a callsite back to its parent.
The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430),
but this won't solve that problem.
The transform is currently disabled by default while we wait for clang to work-around
potential security problems:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html
Differential Revision: https://reviews.llvm.org/D27855
llvm-svn: 294998
|
| |
|
|
|
|
| |
test/CodeGen/X86/atomic-minmax-i6432.ll as they don't regenerate cleanly.
llvm-svn: 294996
|
| |
|
|
|
|
|
| |
Also make sure the AArch64 backend doesn't try to convert them into normal
loads and stores.
llvm-svn: 294993
|
| |
|
|
|
|
| |
We're going to need them very soon for GlobalISel.
llvm-svn: 294992
|
| |
|
|
|
|
|
|
|
| |
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.
rdar://30495920
llvm-svn: 294982
|
| |
|
|
|
|
|
|
|
|
| |
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.
Differential Revision: https://reviews.llvm.org/D29782
llvm-svn: 294981
|
| |
|
|
| |
llvm-svn: 294980
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently MachineBasicBlock::updateTerminator simply drops DebugLoc for newly created branch instructions, which may cause incorrect stepping and/or imprecise sample profile data. Below is an example:
```
1 extern int bar(int x);
2
3 int foo(int *begin, int *end) {
4 int *i;
5 int ret = 0;
6 for (
7 i = begin ;
8 i != end ;
9 i++)
10 {
11 ret += bar(*i);
12 }
13 return ret;
14 }
```
Below is a bitcode of 'foo' at the end of LLVM-IR level optimizations with -O3:
```
define i32 @foo(i32* readonly %begin, i32* readnone %end) !dbg !4 {
entry:
%cmp6 = icmp eq i32* %begin, %end, !dbg !9
br i1 %cmp6, label %for.end, label %for.body.preheader, !dbg !12
for.body.preheader: ; preds = %entry
br label %for.body, !dbg !13
for.body: ; preds = %for.body.preheader, %for.body
%ret.08 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
%i.07 = phi i32* [ %incdec.ptr, %for.body ], [ %begin, %for.body.preheader ]
%0 = load i32, i32* %i.07, align 4, !dbg !13, !tbaa !15
%call = tail call i32 @bar(i32 %0), !dbg !19
%add = add nsw i32 %call, %ret.08, !dbg !20
%incdec.ptr = getelementptr inbounds i32, i32* %i.07, i64 1, !dbg !21
%cmp = icmp eq i32* %incdec.ptr, %end, !dbg !9
br i1 %cmp, label %for.end.loopexit, label %for.body, !dbg !12, !llvm.loop !22
for.end.loopexit: ; preds = %for.body
br label %for.end, !dbg !24
for.end: ; preds = %for.end.loopexit, %entry
%ret.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.end.loopexit ]
ret i32 %ret.0.lcssa, !dbg !24
}
```
where
```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```
. As you can see, the terminator of 'entry' block, which is a loop control branch, has a DebugLoc of line 6, column 3. Howerver, after the execution of 'MachineBlock::updateTerminator' function, which is triggered by MachineSinking pass, the DebugLoc info is dropped as below (see there's no debug-location for JNE_1):
```
bb.0.entry:
successors: %bb.4(0x30000000), %bb.1.for.body.preheader(0x50000000)
liveins: %rdi, %rsi
%6 = COPY %rsi
%5 = COPY %rdi
%8 = SUB64rr %5, %6, implicit-def %eflags, debug-location !9
JNE_1 %bb.1.for.body.preheader, implicit %eflags
```
This patch addresses this issue and make newly created branch instructions to keep debug-location info.
Reviewers: aprantl, MatzeB, craig.topper, qcolombet
Reviewed By: qcolombet
Subscribers: qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D29596
llvm-svn: 294976
|
| |
|
|
|
|
|
|
| |
This reverts commit r294967. This patch caused execution time slowdowns in a
few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot.
I'm reverting to investigate.
llvm-svn: 294973
|
| |
|
|
|
|
|
|
| |
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.
llvm-svn: 294970
|
| |
|
|
|
|
|
|
| |
I'd missed a creator of FCMP nodes - duplicateCmp().
Kindly and promptly reported by Gabor Ballabas, due to his CSiBE test suite.
llvm-svn: 294968
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends the optimization of truncations whose operand is an
induction variable with a constant integer step. Previously we were only
applying this optimization to the primary induction variable. However, the cost
model assumes the optimization is applied to the truncation of all integer
induction variables (even regardless of step type). The transformation is now
applied to the other induction variables, and I've updated the cost model to
ensure it is better in sync with the transformation we actually perform.
Differential Revision: https://reviews.llvm.org/D29847
llvm-svn: 294967
|
| |
|
|
| |
llvm-svn: 294966
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up the implementation of divide macro expansion by getting rid of a
FIXME regarding magic numbers and branch instructions. Match GAS' behaviour
for expansion of ddiv / div in the two and three operand cases. Add the two
operand alias for MIPSR6. Finally, optimize macro expansion cases where the
divisior is the $zero register.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D29887
llvm-svn: 294960
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29308
llvm-svn: 294955
|
| |
|
|
| |
llvm-svn: 294952
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The attached test case fails with "fatal error: error in backend:
misaligned pc-relative fixup value" as the jump table is misaligned.
The EmitAlignment existed already for ARM and Thumb-1 code, but was
missing for Thumb-2.
The test checks that the fatal error disappears when generating an obj
file, as well as checking the align directive is there when producing an
asm file.
Reviewers: rengolin, grosbach, t.p.northover, jmolloy, SjoerdMeijer, samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D29650
llvm-svn: 294950
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We match a sequence of 3-4 instructions into a tTBB pseudo. One of our checks is that
a particular register in that sequence is killed (so it can be clobbered by the pseudo).
We weren't noticing if an errant MOV or other instruction had infiltrated the
sequence we were walking. If it had, and it defined the register we've already
identified as killed, it makes it live across the tBR_JT and thus unclobberable.
Notice this case and bail out.
llvm-svn: 294949
|
| |
|
|
|
|
| |
Added v4i32 and v2i64 tests and test on i686 as well as x86_64.
llvm-svn: 294946
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When generating a floating point comparison we currently unconditionally
generate VCMPE. This has the sideeffect of setting the cumulative Invalid
bit in FPSCR if any of the operands are QNaN.
It is expected that use of a relational predicate on a QNaN value should
raise Invalid. Quoting from the C standard:
The relational and equality operators support the usual mathematical
relationships between numeric values. For any ordered pair of numeric
values exactly one of relationships the less, greater, equal and is true.
Relational operators may raise the floating-point exception when argument
values are NaNs.
The standard doesn't explicitly state the expectation for equality operators,
but the implication and obvious expectation is that equality operators
should not raise Invalid on a QNaN input, as those predicates are wholly
defined on unordered inputs (to return not equal).
Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if
QNaN should raise Invalid, and pipe that through to TableGen.
llvm-svn: 294945
|
| |
|
|
|
|
| |
common check prefix ALL. NFC.
llvm-svn: 294938
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reductions.
Currently, LLVM supports vectorization of horizontal reduction
instructions with initial value set to 0. Patch supports vectorization
of reduction with non-zero initial values. Also, it supports a
vectorization of instructions with some extra arguments, like:
```
float f(float x[], int a, int b) {
float p = a % b;
p += x[0] + 3;
for (int i = 1; i < 32; i++)
p += x[i];
return p;
}
```
Patch allows vectorization of this kind of horizontal reductions.
Differential Revision: https://reviews.llvm.org/D29727
llvm-svn: 294934
|
| |
|
|
|
|
| |
into the same location of a an undef vector can just use the original input to the extract.
llvm-svn: 294932
|
| |
|
|
|
|
|
|
| |
to support 512-bit vectors with 128-bit or 256-bit subvectors.
We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors.
llvm-svn: 294931
|
| |
|
|
|
|
|
|
| |
EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.
llvm-svn: 294930
|
| |
|
|
|
|
| |
why they fail.
llvm-svn: 294928
|
| |
|
|
|
|
| |
eliminates no-use readonly/readnone calls, even if they are not marked nounwind. NewGVN only eliminates them if they are marked nounwind, and thus, trivially dead.
llvm-svn: 294927
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The bug was introduced with:
https://reviews.llvm.org/rL294863
...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should
still be able to fold this setcc/compare.
llvm-svn: 294924
|
| |
|
|
| |
llvm-svn: 294922
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds support for placing predicateinfo such that it affects critical edges.
This fixes the issues mentioned by Nuno on the mailing list.
Depends on D29519
Reviewers: davide, nlopes
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29606
llvm-svn: 294921
|