| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 260152
|
| |
|
|
|
|
|
|
|
|
|
| |
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.
Code comments note that this transform could be extended for other targets / cases.
Differential Revision: http://reviews.llvm.org/D16828
llvm-svn: 260145
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We will hit this once we have enabled uniform branches. The
smrd-vccz-bug.ll test will be added with the uniform branch commit.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16725
llvm-svn: 260137
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches GCC and MSVC's behaviour, and saves on code size.
We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.
The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].
Differential Revision: http://reviews.llvm.org/D16907
[1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
[2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ
llvm-svn: 260133
|
| |
|
|
|
|
|
| |
The accumulator in multiply-and-subtract instructions is actually subtracted
*from* so these patterns were computing the wrong value.
llvm-svn: 260131
|
| |
|
|
|
|
| |
Nothing is using them.
llvm-svn: 260123
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16983
llvm-svn: 260101
|
| |
|
|
| |
llvm-svn: 260070
|
| |
|
|
| |
llvm-svn: 260069
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.
This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.
Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.
Differential Revision: http://reviews.llvm.org/D16683
llvm-svn: 260063
|
| |
|
|
| |
llvm-svn: 260034
|
| |
|
|
|
|
|
|
| |
rounding mode
Differential Revision: http://reviews.llvm.org/D16629
llvm-svn: 260033
|
| |
|
|
|
|
|
|
|
|
| |
NFCI.
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.
The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.
llvm-svn: 260032
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16813
llvm-svn: 260024
|
| |
|
|
| |
llvm-svn: 260007
|
| |
|
|
|
|
| |
To allow the helper functions to make use of them.
llvm-svn: 259997
|
| |
|
|
|
|
|
|
| |
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.
This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.
llvm-svn: 259995
|
| |
|
|
|
|
|
|
| |
If we are already loading a single 32-bit float/integer then just reuse it.
Fix for regression in D16729
llvm-svn: 259991
|
| |
|
|
| |
llvm-svn: 259990
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A).
Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover
Subscribers: aemerson, rengolin, MatzeB
Differential Revision: http://reviews.llvm.org/D16644
llvm-svn: 259958
|
| |
|
|
|
|
|
| |
Remove narrow load / store instructions from getMatchingPairOpcode(),
and add getMatchingWideOpcode().
llvm-svn: 259914
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.
Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.
llvm-svn: 259912
|
| |
|
|
|
|
|
| |
Also switch to internal linkage, and include the name of the function in
the name.
llvm-svn: 259911
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16862
llvm-svn: 259900
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16863
llvm-svn: 259897
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16724
llvm-svn: 259894
|
| |
|
|
| |
llvm-svn: 259893
|
| |
|
|
|
|
|
| |
This is a simple fix for a PowerPC intrinsic that was incorrectly defined
(the return type was incorrect).
llvm-svn: 259886
|
| |
|
|
|
|
| |
No functionality change intended.
llvm-svn: 259883
|
| |
|
|
|
|
| |
This reverts commit r259812 as it broke AArch64 self-hosting.
llvm-svn: 259881
|
| |
|
|
|
|
|
|
|
| |
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.
llvm-svn: 259840
|
| |
|
|
|
|
|
| |
This only impacts the creation of pre-/post-index instructions. The bound was
set high enough such that it did not change code generation for SPEC200X.
llvm-svn: 259828
|
| |
|
|
|
|
|
|
|
|
| |
EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.
This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....
llvm-svn: 259816
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769 and r259790. The tramp3d failure was caused
by an incorrect refactoring in the patch. Specifically, we weren't always
properly clearing the SExtIdx flag.
llvm-svn: 259812
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:
(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:
(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.
Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.
Patch by Chris Diamand!
llvm-svn: 259800
|
| |
|
|
|
|
|
|
|
|
| |
EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.
Differential Revision: http://reviews.llvm.org/D16729
llvm-svn: 259796
|
| |
|
|
|
|
| |
This reverts commit r259790. tramp3d-v4 is still having problems.
llvm-svn: 259795
|
| |
|
|
|
|
|
|
| |
The FMA instruction was selected from AVX2 set instead of AVX-512
Differential Revision: http://reviews.llvm.org/D16884
llvm-svn: 259792
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure. I'm unable to reproduce the issue at this time.
llvm-svn: 259790
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16589
llvm-svn: 259789
|
| |
|
|
| |
llvm-svn: 259771
|
| |
|
|
|
|
|
|
| |
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.
Differential Revision: http://reviews.llvm.org/D16404
llvm-svn: 259770
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: jholewinski, tra, eliben
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D16874
llvm-svn: 259749
|
| |
|
|
| |
llvm-svn: 259720
|
| |
|
|
|
|
|
|
|
|
|
| |
Add support for TLS access for Windows on ARM. This generates a similar access
to MSVC for ARM.
The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call. The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.
llvm-svn: 259676
|
| |
|
|
|
|
|
|
|
|
| |
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.
Fixes PR26450.
llvm-svn: 259657
|
| |
|
|
| |
llvm-svn: 259655
|
| |
|
|
|
|
| |
Simple fix - Constant values were not being sign extended in FastIsel.
llvm-svn: 259645
|
| |
|
|
|
|
|
|
|
|
| |
MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL
flag. See Figure 4–7 on page 69 in the following document:
ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf.
Differential Revision: http://reviews.llvm.org/D15740
llvm-svn: 259641
|
| |
|
|
|
|
|
|
|
|
|
|
| |
EltsFromConsecutiveLoads
Follow up to D16217 and D16729
This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD
Differential Revision: http://reviews.llvm.org/D16768
llvm-svn: 259635
|