| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
intrinsics. NFC
llvm-svn: 286961
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is needed to be able to use this flags in InstrMappings.
Reviewers: tstellarAMD, vpykhtin
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D26666
llvm-svn: 286960
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.
Fixes pr30981.
Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet
Subscribers: qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D26620
llvm-svn: 286958
|
| |
|
|
|
|
|
|
|
|
|
| |
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases. As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.
Differential Revision: https://reviews.llvm.org/D26578
llvm-svn: 286951
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the Sched Machine Model for Cortex-R52.
Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.
Reviewers: rengolin, jmolloy
Differential Revision: https://reviews.llvm.org/D26500
llvm-svn: 286949
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add basic functionality to support call lowering for X86.
Currently only supports functions which return void and take zero arguments.
Inspired by commit 286573.
Reviewers: ab, qcolombet, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26593
llvm-svn: 286935
|
| |
|
|
| |
llvm-svn: 286931
|
| |
|
|
|
|
|
| |
This doesn't solve any problems I know about, but this should have
more conservative assumptions about the operands'
llvm-svn: 286913
|
| |
|
|
| |
llvm-svn: 286912
|
| |
|
|
|
|
|
|
|
|
| |
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.
Differential revision: https://reviews.llvm.org/D26518
llvm-svn: 286907
|
| |
|
|
|
|
|
| |
Change "orisadd" to "IsOrAdd" to follow the naming conventions, and
change "isOrAdd" in the C++ code to "isOrEquivalentToAdd".
llvm-svn: 286886
|
| |
|
|
| |
llvm-svn: 286882
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.
llvm-svn: 286881
|
| |
|
|
|
|
| |
Follow-up change to r286875.
llvm-svn: 286879
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.
For example, the final generated code should be:
stp xzr, xzr, [x0]
instead of:
movi v0.2d, #0
str q0, [x0]
Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26561
llvm-svn: 286875
|
| |
|
|
| |
llvm-svn: 286874
|
| |
|
|
| |
llvm-svn: 286869
|
| |
|
|
| |
llvm-svn: 286868
|
| |
|
|
|
|
|
| |
This reverts commit 286866. It broke a bot, something to do with exactly which
templates std::sort accepts.
llvm-svn: 286867
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
llvm-svn: 286866
|
| |
|
|
|
|
|
|
|
| |
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.
Differential review: https://reviews.llvm.org/D26536
llvm-svn: 286862
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend image intrinsics to support data types of V1F32 and V2F32.
TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
even though such case should be very rare.
Reviewers:
tstellarAMD
Differential Revision:
http://reviews.llvm.org/D26472
llvm-svn: 286860
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0
In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.
llvm-svn: 286843
|
| |
|
|
|
|
| |
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result
llvm-svn: 286838
|
| |
|
|
|
|
|
|
| |
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.
This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)
llvm-svn: 286832
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26272
llvm-svn: 286829
|
| |
|
|
| |
llvm-svn: 286808
|
| |
|
|
|
|
|
|
| |
vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior.
Fixes another problem from PR28850.
llvm-svn: 286790
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions.
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.
This should fix at least some of PR28850.
llvm-svn: 286787
|
| |
|
|
|
|
| |
intrinsics to the new unmasked versions and selects.
llvm-svn: 286786
|
| |
|
|
|
|
|
|
| |
immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd.
Fixes PR24941.
llvm-svn: 286775
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.
This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.
The cache also needs to be flushed before wave termination
if a scalar store is used.
llvm-svn: 286766
|
| |
|
|
| |
llvm-svn: 286765
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26128
llvm-svn: 286761
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26022
llvm-svn: 286758
|
| |
|
|
|
|
| |
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.
llvm-svn: 286754
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D25975
llvm-svn: 286753
|
| |
|
|
|
|
|
|
| |
Autoupgrade them to recently introduced unmasked versions and a select.
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.
llvm-svn: 286725
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
element in XMM.
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.
This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.
Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.
Reviewers: RKSimon, zvi, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26333
llvm-svn: 286711
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.
Reviewers: delena, RKSimon
Subscribers: Farhana, llvm-commits
Differential Revision: https://reviews.llvm.org/D26297
llvm-svn: 286709
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes a regression caused by r286464.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D26570
llvm-svn: 286687
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This pass was assuming that when a PHI instruction defined a register
used by another PHI instruction that the defining insstruction would
be legalized before the using instruction.
This assumption was causing the pass to not legalize some PHI nodes
within divergent flow-control.
This fixes a bug that was uncovered by r285762.
Reviewers: nhaehnle, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D26303
llvm-svn: 286676
|
| |
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D26480
Adds all the intrinsics used for various permute builtins that will
be added to altivec.h.
llvm-svn: 286638
|
| |
|
|
| |
llvm-svn: 286625
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix off-by-one indexing error in loop checking that inserted value was a
splat vector.
Add code to check that INSERT_VECTOR_ELT nodes constructing the splat
vector have the expected constant index values.
Reviewers: t.p.northover, jmolloy, mcrosier
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26409
llvm-svn: 286616
|
| |
|
|
| |
llvm-svn: 286606
|
| |
|
|
| |
llvm-svn: 286601
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D26307
Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.
llvm-svn: 286596
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimization merges adjacent zero stores into a wider store.
e.g.,
strh wzr, [x0]
strh wzr, [x0, #2]
; becomes
str wzr, [x0]
e.g.,
str wzr, [x0]
str wzr, [x0, #4]
; becomes
str xzr, [x0]
Previously, this was only enabled for Kryo and Cortex-A57.
Differential Revision: https://reviews.llvm.org/D26396
llvm-svn: 286592
|
| |
|
|
| |
llvm-svn: 286591
|