| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
[X86][AVX512] add comi with Sae
llvm-svn: 252154
|
| |
|
|
|
|
|
|
| |
add builtin_ia32_vcomisd and builtin_ia32_vcomisd
Differential Revision: http://reviews.llvm.org/D14331
llvm-svn: 252153
|
| |
|
|
|
|
|
|
| |
VPBROADCASTMW2D and VPBROADCASTMB2Q
Differential Revision: http://reviews.llvm.org/D14335
llvm-svn: 252151
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GetUnderlyingObjects() can return "null" among its list of objects,
we don't want to deduce that two pointers can point to the same
memory in this case, so filter it out.
Reviewers: anemet
Subscribers: dexonsmith, llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 252149
|
| |
|
|
| |
llvm-svn: 252142
|
| |
|
|
|
|
|
|
|
|
| |
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.
This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.
llvm-svn: 252140
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The CLR's personality routine passes the pointer to the establisher frame
in RCX, not RDX.
Reviewers: pgavlin, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14343
llvm-svn: 252135
|
| |
|
|
|
|
|
|
| |
This brings back the behavior from before r252090 for out of range symbols.
Should bring some arm bots back.
llvm-svn: 252119
|
| |
|
|
| |
llvm-svn: 252117
|
| |
|
|
|
|
| |
It is pretty simple now that the yak is shaved.
llvm-svn: 252105
|
| |
|
|
|
|
|
| |
We now always create the fragment, which lets us handle things like .org after
a .align.
llvm-svn: 252101
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is enabled only under -ffast-math.
So, instead of emitting:
4007b0: 50 push %rax
4007b1: e8 8a fd ff ff callq 400540 <atanf@plt>
4007b6: 58 pop %rax
4007b7: e9 94 fd ff ff jmpq 400550 <tanf@plt>
4007bc: 0f 1f 40 00 nopl 0x0(%rax)
for:
float mytan(float x) {
return tanf(atanf(x));
}
we emit a single retq.
Differential Revision: http://reviews.llvm.org/D14302
llvm-svn: 252098
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Earlier CaptureTracking would assume all "interesting" operands to a
call or invoke were its arguments. With operand bundles this is no
longer true.
Note: an earlier change got `doesNotCapture` working correctly with
operand bundles.
This change uses DSE to test the changes to CaptureTracking. DSE is a
vehicle for testing only, and is not directly involved in this change.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14306
llvm-svn: 252095
|
| |
|
|
|
|
|
|
| |
The generic infrastructure already did a lot of work to decide if the
fixup value is know or not. It doesn't make sense to reimplement a very
basic case: same fragment.
llvm-svn: 252090
|
| |
|
|
|
|
|
|
|
|
| |
Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.
Fixes PR24193.
llvm-svn: 252088
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.
The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.
This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.
It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.
Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll
Differential Revision: http://reviews.llvm.org/D13988
llvm-svn: 252074
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
scalar FMA intrinsics.
Patch by Slava Klochkov
The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.
Reviewers: Quentin Colombet, Elena Demikhovsky
Differential Revision: http://reviews.llvm.org/D13710
llvm-svn: 252060
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
llvm-svn: 252057
|
| |
|
|
|
|
|
|
| |
When converting an alias to a non-alias when the aliasee is not
imported, ensure that the linkage type is set to external so that it is
a valid linkage type. Added a test case that exposed this issue.
llvm-svn: 252054
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code:
if (c & flag1)
*a = x;
if (c & flag2)
*a = y;
...
There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to.
It is, however, legal to merge the stores together and do the store once:
tmp = undef;
if (c & flag1)
tmp = x;
if (c & flag2)
tmp = y;
if (c & flag1 || c & flag2)
*a = tmp;
The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too.
This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure.
The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate.
This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite.
llvm-svn: 252051
|
| |
|
|
|
|
| |
Bug found with afl-fuzz.
llvm-svn: 252048
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D14109
llvm-svn: 252043
|
| |
|
|
|
|
|
| |
The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp
directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode.
llvm-svn: 252042
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In my previous change to CVP (251606), I made CVP much more aggressive about trying to constant fold comparisons. This patch is a reversal in direction. Rather than being agressive about every compare, we restore the non-block local restriction for most, and then try hard for compares feeding returns.
The motivation for this is two fold:
* The more I thought about it, the less comfortable I got with the possible compile time impact of the other approach. There have been no reported issues, but after talking to a couple of folks, I've come to the conclusion the time probably isn't justified.
* It turns out we need to know the context to leverage the full power of LVI. In particular, asking about something at the end of it's block (the use of a compare in a return) will frequently get more precise results than something in the middle of a block. This is an implementation detail, but it's also hard to get around since mid-block queries have to reason about possible throwing instructions and don't get to use most of LVI's block focused infrastructure. This will become particular important when combined with http://reviews.llvm.org/D14263.
Differential Revision: http://reviews.llvm.org/D14271
llvm-svn: 252032
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
allow to lower call safepoints with relocates and results in a different
basic blocks. See test case for example.
Differential Revision: http://reviews.llvm.org/D14158
llvm-svn: 252028
|
| |
|
|
| |
llvm-svn: 252027
|
| |
|
|
| |
llvm-svn: 252020
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop. E.g.:
for (i)
A[i + 1] = A[i] + B[i]
=>
T = A[0]
for (i)
T = T + B[i]
A[i + 1] = T
The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load. Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.
This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning. As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here. Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.
In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads. I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.
The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop. Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%). This gain also transfers over to x86: it's
around 8-10%.
Right now the pass is off by default and can be enabled
with -enable-loop-load-elim. On the LNT testsuite, there are two
performance changes (negative number -> improvement):
1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
critical paths is reduced
2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
outside of LNT
The pass is scheduled after the loop vectorizer (which is after loop
distribution). The rational is to try to reuse LAA state, rather than
recomputing it. The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.
LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences). LAA is known to omit loop-independent
dependences in certain situations. The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.
Reviewers: dberlin, hfinkel
Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13259
llvm-svn: 252017
|
| |
|
|
|
|
|
|
|
| |
If the requested SGPR was not actually aligned, it was
accepted and rounded down instead of rejected.
Also fix an assert if the range is an invalid size.
llvm-svn: 252009
|
| |
|
|
|
|
| |
If trying to use one past the end, this would assert.
llvm-svn: 252008
|
| |
|
|
| |
llvm-svn: 252004
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add support for wasm's select operator, and lower LLVM's select DAG node
to it.
Reviewers: sunfish
Subscribers: dschuff, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D14295
llvm-svn: 252002
|
| |
|
|
|
|
|
|
|
| |
There are actually 104 so 2 were missing.
More assembler tests with high register number tuples
will be included in later patches.
llvm-svn: 251999
|
| |
|
|
|
|
| |
To avoid alternative lowerings.
llvm-svn: 251986
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We now collect all types of dependences including lexically forward
deps not just "interesting" ones.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13256
llvm-svn: 251985
|
| |
|
|
| |
llvm-svn: 251984
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This one is enabled only under -ffast-math (due to rounding/overflows)
but allows us to emit shorter code.
Before (on FreeBSD x86-64):
4007f0: 50 push %rax
4007f1: f2 0f 11 0c 24 movsd %xmm1,(%rsp)
4007f6: e8 75 fd ff ff callq 400570 <exp2@plt>
4007fb: f2 0f 10 0c 24 movsd (%rsp),%xmm1
400800: 58 pop %rax
400801: e9 7a fd ff ff jmpq 400580 <pow@plt>
400806: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40080d: 00 00 00
After:
4007b0: f2 0f 59 c1 mulsd %xmm1,%xmm0
4007b4: e9 87 fd ff ff jmpq 400540 <exp2@plt>
4007b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
Differential Revision: http://reviews.llvm.org/D14045
llvm-svn: 251976
|
| |
|
|
|
|
|
|
|
|
| |
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )
This patch adds tablegen pattern matching for this instruction.
Differential Revision: http://reviews.llvm.org/D8841
llvm-svn: 251975
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When the dependence distance in zero then we have a loop-independent
dependence from the earlier to the later access.
No current client of LAA uses forward dependences so other than
potentially hitting the MaxDependences threshold earlier, this change
shouldn't affect anything right now.
This and the previous patch were tested together for compile-time
regression. None found in LNT/SPEC.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13255
llvm-svn: 251973
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Before this change, we didn't use to collect forward dependences since
none of the current clients (LV, LDist) required them.
The motivation to also collect forward dependences is a new pass
LoopLoadElimination (LLE) which discovers store-to-load forwarding
opportunities across the loop's backedge. The pass uses both lexically
forward or backward loop-carried dependences to detect these
opportunities.
The new pass also analyzes loop-independent (forward) dependences since
they can conflict with the loop-carried dependences in terms of how the
data flows through memory.
The newly added test only covers loop-carried forward dependences
because loop-independent ones are currently categorized as NoDep. The
next patch will fix this.
The two patches were tested together for compile-time regression. None
found in LNT/SPEC.
Note that with this change LAA provides all dependences rather than just
"interesting" ones. A subsequent NFC patch will remove the now trivial
isInterestingDependence and rename the APIs.
Reviewers: hfinkel
Subscribers: jmolloy, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13254
llvm-svn: 251972
|
| |
|
|
|
|
|
| |
We are long past the time when this much bug for bug compatibility was
useful.
llvm-svn: 251970
|
| |
|
|
| |
llvm-svn: 251967
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commit r251926. I believe this is causing an LTO
bootstrapping bot failure
(http://lab.llvm.org:8080/green/job/llvm-stage2-cmake-RgLTO_build/3669/).
Haven't been able to repro it yet, but after looking at the metadata I
am pretty sure I know what is going on.
llvm-svn: 251965
|
| |
|
|
| |
llvm-svn: 251964
|
| |
|
|
|
|
|
|
| |
We now create them as they are found and use higher level APIs.
This is a step in avoiding creating unnecessary sections.
llvm-svn: 251958
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D14258
llvm-svn: 251957
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a PHI to a SCEVConstant
Summary:
Since now Scalar Evolution can create non-add rec expressions for PHI
nodes, it can also create SCEVConstant expressions. This will confuse
replaceCongruentPHIs, which previously relied on the fact that SCEV
could not produce constants in this case.
We will now replace the node with a constant in these cases - or avoid
processing the Phi in case of a type mismatch.
Reviewers: sanjoy
Subscribers: llvm-commits, majnemer
Differential Revision: http://reviews.llvm.org/D14230
llvm-svn: 251938
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, named metadata is linked before the LazilyLinkGlobalValues
list is walked and materialized/linked. As a result, references
from DISubprogram and DIGlobalVariable metadata to yet unmaterialized
functions and variables cause them to be added to the lazy linking
list and their definitions are materialized and linked.
This makes the llvm-link -only-needed option not have the intended
effect when debug information is present, as the otherwise unneeded
functions/variables are still linked in.
Additionally, for ThinLTO I have implemented a mechanism to only link
in debug metadata needed by imported functions. Moving named metadata
linking after lazy GV linking will facilitate applying this mechanism
to the LTO and "llvm-link -only-needed" cases as well.
Reviewers: dexonsmith, tra, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14195
llvm-svn: 251926
|
| |
|
|
|
|
|
|
|
| |
This assert was reachable from user input. A minimized test case (no
FUNCTION_BLOCK_ID record) is attached.
Bug found with afl-fuzz
llvm-svn: 251910
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Skipping 'bitcast' in this case allows to vectorize load:
%arrayidx = getelementptr inbounds double*, double** %in, i64 %indvars.iv
%tmp53 = bitcast double** %arrayidx to i64*
%tmp54 = load i64, i64* %tmp53, align 8
Differential Revision http://reviews.llvm.org/D14112
llvm-svn: 251907
|