| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 288916
|
| |
|
|
|
|
| |
INSERT_VECTOR_ELT
llvm-svn: 288912
|
| |
|
|
|
|
| |
They were testing for the pre-vex versions
llvm-svn: 288911
|
| |
|
|
|
|
| |
folding tests
llvm-svn: 288910
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.
The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.
Differential Revision: https://reviews.llvm.org/D26083
llvm-svn: 288909
|
| |
|
|
| |
llvm-svn: 288906
|
| |
|
|
| |
llvm-svn: 288905
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of the dominating load.
In the case of a fully redundant load LI dominated by an equivalent load V, GVN
should always preserve the original debug location of V. Otherwise, we risk to
introduce an incorrect stepping.
If V has debug info, then clearly it should not be modified. If V has a null
debugloc, then it is still potentially incorrect to propagate LI's debugloc
because LI may not post-dominate V.
Differential Revision: https://reviews.llvm.org/D27468
llvm-svn: 288903
|
| |
|
|
|
|
|
|
|
|
| |
integer domain
We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain.
Differential Revision: https://reviews.llvm.org/D27419
llvm-svn: 288902
|
| |
|
|
| |
llvm-svn: 288899
|
| |
|
|
|
|
|
|
| |
The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version.
Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries.
llvm-svn: 288898
|
| |
|
|
|
|
| |
Fixes PR 31256
llvm-svn: 288897
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions inlined from functions with debug info.
When a function F is inlined, InlineFunction extends the debug location of every
instruction inlined from F by adding an InlinedAt.
However, if an instruction has a 'null' debug location, InlineFunction would
propagate the callsite debug location to it. This behavior existed since
revision 210459.
Revision 210459 was originally committed specifically to workaround the lack of
debug information for instructions inlined from intrinsic functions (which are
usually declared with attributes `__always_inline__, __nodebug__`).
The problem with revision 210459 is that it doesn't make any sort of distinction
between instructions inlined from a 'nodebug' function and instructions which
are inlined from a function built with debug info. This issue may lead to
incorrect stepping in the debugger.
This patch works under the assumption that a nodebug function does not have a
DISubprogram. When a function F is inlined into another function G,
InlineFunction checks if F has debug info associated with it.
For nodebug functions, the InlineFunction logic is unchanged (i.e. it would
still propagate the callsite debugloc to the inlined instructions). Otherwise,
InlineFunction no longer propagates the callsite debug location.
Differential Revision: https://reviews.llvm.org/D27462
llvm-svn: 288895
|
| |
|
|
| |
llvm-svn: 288881
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch By: Wei Ding
Summary: This patch fixes the fdiv precision issues.
Reviewers: b-sumner, cfang, wdng, arsenm
Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D26424
llvm-svn: 288879
|
| |
|
|
|
|
|
|
| |
In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511].
Differential Revision: https://reviews.llvm.org/D27480
llvm-svn: 288876
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D26725
llvm-svn: 288865
|
| |
|
|
| |
llvm-svn: 288861
|
| |
|
|
| |
llvm-svn: 288858
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some platforms (like MSP430) the second element of the result
structure for SMULO/UMULO may have a shorter type than the one
returned by SetCC. We need to truncate it to the right type, or
else some incorrect code may be generated later on.
This fixes issue https://github.com/rust-lang/rust/issues/37829
Patch by Vadzim Dambrouski!
Differential Revision: https://reviews.llvm.org/D27154
llvm-svn: 288857
|
| |
|
|
|
|
| |
Other VOP instructions call the output vdst
llvm-svn: 288856
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Eli noted in the post-commit thread for r288833, the use of
swapOperands() may not be allowed in InstSimplify, so I'm
removing those calls here pending further review.
The swap mutates the icmp, and there doesn't appear to be precedent
for instruction mutation in InstSimplify.
I didn't actually have any tests for those cases, so I'm adding
a few here.
llvm-svn: 288855
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D27416
llvm-svn: 288852
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.
Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17
but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.
Differential Revision: https://reviews.llvm.org/D27471
llvm-svn: 288851
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If we write an immediate to a VGPR and then copy the VGPR to an
SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than
moving the copy to the SALU.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D27272
llvm-svn: 288849
|
| |
|
|
|
|
|
| |
We were rounding size in bits down rather than up, leading to 0-sized slots for
i1 (assert!) and bugs for other types not byte-aligned.
llvm-svn: 288848
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].
Fixes pr31202.
Analysis of this issue was done by Fahana Aleen.
Reviewers: wmi, delena, mkuper
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27203
llvm-svn: 288844
|
| |
|
|
|
|
|
|
| |
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.
Differential Revision: https://reviews.llvm.org/D27461
llvm-svn: 288842
|
| |
|
|
|
|
|
|
|
|
|
| |
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.
This is the 'and' sibling of the earlier 'or' patch:
https://reviews.llvm.org/rL288833
llvm-svn: 288841
|
| |
|
|
| |
llvm-svn: 288840
|
| |
|
|
|
|
| |
we don't actually demand that element
llvm-svn: 288839
|
| |
|
|
| |
llvm-svn: 288837
|
| |
|
|
|
|
|
|
|
|
| |
There were two problems:
+ AArch64 was reusing random data from its binary op tables, which is
complete nonsense for G_SEQUENCE.
+ Even when AArch64 gave up and said it couldn't handle G_SEQUENCE,
the generic code asserted.
llvm-svn: 288836
|
| |
|
|
| |
llvm-svn: 288835
|
| |
|
|
|
|
|
|
| |
It'll almost immediately fail because it always tries to half/double the size
until it finds a legal one. Unfortunately, this triggers an assertion
preventing the DAG fallback from being possible.
llvm-svn: 288834
|
| |
|
|
|
|
|
|
| |
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.
llvm-svn: 288833
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
These are OpenBSD specific program headers.
OpenBSD commit:
https://github.com/openbsd/src/commit/d39116912b9536bd77326260dc5c6e593fd4ee24
It is required for fixing PR31288.
Differential revision: https://reviews.llvm.org/D27456
llvm-svn: 288831
|
| |
|
|
| |
llvm-svn: 288830
|
| |
|
|
|
|
| |
shuffle elements
llvm-svn: 288825
|
| |
|
|
| |
llvm-svn: 288819
|
| |
|
|
|
|
| |
Test the sequential effect of each op
llvm-svn: 288815
|
| |
|
|
|
|
|
|
|
| |
vectorizations
e.g.
buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32))
llvm-svn: 288807
|
| |
|
|
|
|
|
|
|
|
| |
When we see a non flag-setting instruction for which only the flag-setting
version is available in Thumb1, we should give a better error message than
"invalid instruction".
Differential Revision: https://reviews.llvm.org/D27414
llvm-svn: 288805
|
| |
|
|
|
|
|
|
|
|
|
|
| |
broadcasting.
Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern.
For example:
"build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>"
Differential Revision: https://reviews.llvm.org/D26802
llvm-svn: 288804
|
| |
|
|
|
|
| |
SMAX/SMIN/UMAX/UMIN
llvm-svn: 288801
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is the final patch in the series of patches that improves
BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations
to remove redundant instructions. It also adds a large test case which
encompasses a large set of code patterns that build vectors - this test
case was the motivator for this series of patches.
Differential Revision: https://reviews.llvm.org/D26066
llvm-svn: 288800
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated.
Reviewers: mkuper, aprantl, MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27343
llvm-svn: 288794
|
| |
|
|
|
|
|
|
| |
This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still.
The only test case for this pattern is one I just added so there was no coverage of this.
llvm-svn: 288784
|
| |
|
|
|
|
|
|
|
|
| |
(scalar_to_vector loadf64) uses a scalar sqrt instruction.
This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok. I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction.
I will remove this pattern in a follow up patch. There appears to be no other test content for it.
llvm-svn: 288783
|
| |
|
|
| |
llvm-svn: 288782
|