| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Remove sign extend in register style pattern if the sign is already extended enough
llvm-svn: 314599
|
| |
|
|
| |
llvm-svn: 314598
|
| |
|
|
|
| |
Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1
llvm-svn: 314596
|
| |
|
|
|
|
|
|
| |
Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results.
Differential Revision: https://reviews.llvm.org/D38307
llvm-svn: 314584
|
| |
|
|
|
|
|
|
| |
Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val)
Differential Revision: https://reviews.llvm.org/D38161
llvm-svn: 314514
|
| |
|
|
|
| |
Change-Id: I360abccee12cae29bd2ac4f8399c9ecc92eb7f13
llvm-svn: 314510
|
| |
|
|
|
|
|
|
|
|
|
| |
immediate expressions
Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized.
supersedes D33278, D35774
Differential Revision: https://reviews.llvm.org/D37412
llvm-svn: 314493
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.
I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.
Reviewers: RKSimon, zvi, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38273
llvm-svn: 314474
|
| |
|
|
|
|
|
|
| |
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.
Differential Revision: https://reviews.llvm.org/D38375
llvm-svn: 314457
|
| |
|
|
|
|
|
|
| |
LowerMULH
We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary.
llvm-svn: 314447
|
| |
|
|
|
|
| |
same if. Remove one that is redundant with another subtarget features.
llvm-svn: 314446
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38305
llvm-svn: 314432
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.
DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38276
llvm-svn: 314430
|
| |
|
|
|
|
|
|
|
|
| |
featuring zero vectors.
Previously we were using one of the subvector indices twice. The included test case causes an assert without this change.
Thanks to Simon Pilgrim for catching this.
llvm-svn: 314429
|
| |
|
|
| |
llvm-svn: 314425
|
| |
|
|
|
|
|
| |
MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively.
Differential Revision: https://reviews.llvm.org/D37190
llvm-svn: 314410
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.
This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.
This also improves several comments on the outliner's cost model.
https://reviews.llvm.org/D36721
llvm-svn: 314341
|
| |
|
|
|
|
|
|
| |
instructions into postRA pseudos like the NOREX version of TEST."""
This caused PR34751
llvm-svn: 314339
|
| |
|
|
|
|
|
|
| |
X86InstrInfo::copyPhysReg."
This contributed to PR34751
llvm-svn: 314338
|
| |
|
|
|
|
| |
Hopefully this will make it easier to vary the combine depth threshold per-target.
llvm-svn: 314337
|
| |
|
|
|
|
|
|
|
|
| |
Zeroable APInt
We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D37950
llvm-svn: 314332
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of using a smaller add and inserting.
In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result.
If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add.
In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely.
Differential Revision: https://reviews.llvm.org/D37453
llvm-svn: 314331
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37473
llvm-svn: 314295
|
| |
|
|
|
|
|
|
| |
results.
As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts
llvm-svn: 314293
|
| |
|
|
|
|
|
|
|
| |
This is necessary, but not sufficient, for having working SJLJ exception
handling on x86_64.
Differential Revision: https://reviews.llvm.org/D38254
llvm-svn: 314277
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The callsite value is already stored indexed from 0 in
the _Unwind_Context struct. When accessed via the functions
_Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1,
but those functions handle the offseting. When reading directly
from the struct here, we shouldn't subtract 1.
This matches the code generated by the ARM target, where SJLJ
exception handling is used by default on iOS.
This makes clang-built object files for 32 bit x86 mingw work when
linked with libgcc/libstdc++.
Differential Revision: https://reviews.llvm.org/D38251
llvm-svn: 314276
|
| |
|
|
|
|
| |
build vectors.
llvm-svn: 314274
|
| |
|
|
| |
llvm-svn: 314250
|
| |
|
|
|
|
|
|
| |
postRA pseudos like the NOREX version of TEST.""
The late MOV8rr_NOREX that caused the crash has been removed.
llvm-svn: 314249
|
| |
|
|
|
|
| |
This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix.
llvm-svn: 314248
|
| |
|
|
| |
llvm-svn: 314247
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86InterleavedAccess (VF{8|16|32} stride 3)
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.
The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
Reviewers:
zvi
guyblank
dorit
Ayal
Differential Revision: https://reviews.llvm.org/D37117
Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch extends the v8i32/v4i32 custom lowering to support v16i32
Reviewers: zvi, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38274
llvm-svn: 314221
|
| |
|
|
|
|
|
|
| |
The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations.
Differential Revision: https://reviews.llvm.org/D37949
llvm-svn: 314207
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org//show_bug.cgi?id=29061
Don't try referencing REX-needed regs when not on 64bit mode
Aligns to GCC
Differetial Revision: https://reviews.llvm.org/D37801
llvm-svn: 314203
|
| |
|
|
|
|
|
|
| |
pseudos like the NOREX version of TEST."
Makes llc crash. This reverts commit r314151.
llvm-svn: 314199
|
| |
|
|
|
|
|
|
|
|
|
|
| |
llvm side.
Removing X86 broadcast(f/i)32x2 intrinsics from llvm.
Adding autoUpgrade support.
Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll.
Differential Revision: https://reviews.llvm.org/D38220
llvm-svn: 314195
|
| |
|
|
|
|
|
|
| |
R12 is used for the SwiftError parameter. It is no longer a CSR as it
is used for transfer the SwiftError, and the caller must preserve it if
they need to.
llvm-svn: 314165
|
| |
|
|
|
|
|
|
|
|
|
|
| |
instead.
As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0.
I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation.
Differential Revision: https://reviews.llvm.org/D38001
llvm-svn: 314152
|
| |
|
|
|
|
| |
the NOREX version of TEST.
llvm-svn: 314151
|
| |
|
|
|
|
|
|
|
| |
x86-asm-syntax=intel (PR34617).
Fix for incorrect code generation when x86-asm-syntax=intel.
Differential Revision: https://reviews.llvm.org/D37945
llvm-svn: 314140
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
insert_subvector with zero after masked compares with fewer patterns with predicate
This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer.
This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends.
This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size.
Differential Revision: https://reviews.llvm.org/D38217
llvm-svn: 314133
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86InterleavedAccess (VF8 stride 4):
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:
c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal
Differential Revision: https://reviews.llvm.org/D36058
Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
|
| |
|
|
|
| |
Change-Id: I1ddc619169fae6a56308deef8dae5db3da702cf4
llvm-svn: 314103
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TargetTransformInfo::enableMemCmpExpansion.
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.
Remove the unused Instruction* parameter.
Differential Revision: https://reviews.llvm.org/D38165
llvm-svn: 314096
|
| |
|
|
|
|
| |
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.
llvm-svn: 314083
|
| |
|
|
|
|
| |
commutable for the multiply operands.
llvm-svn: 314080
|
| |
|
|
| |
llvm-svn: 314078
|
| |
|
|
|
|
|
|
|
|
| |
This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type.
Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases.
Differential Revision: https://reviews.llvm.org/D35320
llvm-svn: 314076
|
| |
|
|
|
|
|
|
| |
instructions when VLX isn't available.
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.
llvm-svn: 314072
|