| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
used to indicating the zero masking behavior which is not the case here. NFC
llvm-svn: 270333
|
| |
|
|
|
|
| |
reflect the fact that memory is the destination.
llvm-svn: 270332
|
| |
|
|
| |
llvm-svn: 270331
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D20438
llvm-svn: 270322
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D20324
llvm-svn: 270321
|
| |
|
|
|
|
| |
AVX2 versions of vector extract when AVX512VL is enabled.
llvm-svn: 270318
|
| |
|
|
|
|
| |
AVX512VL is enabled. Also add shuffle comment printing for AVX512VL VPERMPD/VPERMQ to keep some tests that now use these instructions instead of the AVX2 ones.
llvm-svn: 270317
|
| |
|
|
|
|
| |
is enabled.
llvm-svn: 270316
|
| |
|
|
|
|
| |
AVX512VL/AVX512BWI equivalents are available.
llvm-svn: 270311
|
| |
|
|
|
|
| |
instead of pattern matching the intrinsics. This unifies handling with AVX512 and allows these intrinsics to select EVEX encoded instructions to increase available registers.
llvm-svn: 270310
|
| |
|
|
|
|
|
|
|
| |
This gets rid of some unnecessary SmallStrings in
X86TargetMachine::getSubtargetImpl.
No functionality change is intended.
llvm-svn: 270270
|
| |
|
|
|
|
|
|
| |
We performed a number of memory allocations each time getTTI was called,
remove them by using SmallString.
No functionality change intended.
llvm-svn: 270246
|
| |
|
|
| |
llvm-svn: 270237
|
| |
|
|
| |
llvm-svn: 270236
|
| |
|
|
| |
llvm-svn: 270234
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch is a first step towards a more extendible method of matching combined target shuffle masks.
Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple.
I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits.
Differential Revision: http://reviews.llvm.org/D19198
llvm-svn: 270230
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.
The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.
llvm-svn: 270209
|
| |
|
|
| |
llvm-svn: 270182
|
| |
|
|
|
|
| |
supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL.
llvm-svn: 270174
|
| |
|
|
|
|
| |
Addresses r270095's code review.
llvm-svn: 270147
|
| |
|
|
|
|
|
|
|
| |
Since the calls don't return, the instruction afterwards will never run,
and is just taking up unnecessary space in the binary.
Differential Revision: http://reviews.llvm.org/D20406
llvm-svn: 270109
|
| |
|
|
|
|
| |
This avoids passing a TargetMachine in a few places.
llvm-svn: 270095
|
| |
|
|
| |
llvm-svn: 270093
|
| |
|
|
|
|
|
|
|
|
| |
Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2.
This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro.
There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006).
Differential Revision: http://reviews.llvm.org/D19659
llvm-svn: 270036
|
| |
|
|
|
|
| |
No changes to the isel table size so the separation wasn't buying us anything.
llvm-svn: 270026
|
| |
|
|
|
|
| |
implied.
llvm-svn: 270025
|
| |
|
|
|
|
| |
classes.
llvm-svn: 270013
|
| |
|
|
|
|
| |
type constraints for vector and scalar.
llvm-svn: 270012
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
|
| |
|
|
| |
llvm-svn: 269962
|
| |
|
|
|
|
|
|
|
|
|
|
| |
instructions"
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.
llvm-svn: 269949
|
| |
|
|
|
|
|
| |
This just clang formats and cleans comments in an area I am about to
post a patch for review.
llvm-svn: 269946
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT
pair while adding a timer function, such that another termination of the MWAITX
instruction occurs when the timer expires. The presence of the MONITORX and
MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29.
The MONITORX and MWAITX instructions are intercepted by the same bits that
intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be
monitored. MWAITX instruction causes the processor to stop instruction execution
and enter an implementation-dependent optimized state until occurrence of a
class of events.
Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is
"0F 01 FB". These opcode information is used in adding tests for the
disassembler.
These instructions are enabled for AMD's bdver4 architecture.
Patch by Ganesh Gopalasubramanian!
Reviewers: echristo, craig.topper, RKSimon
Subscribers: RKSimon, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19795
llvm-svn: 269911
|
| |
|
|
|
|
| |
immediate inputs.
llvm-svn: 269886
|
| |
|
|
|
|
| |
bytes in the DAG isel table by removing type checks for the condition operand which is always a vector or scalar of i1 matching the the number of elements in the other operands.
llvm-svn: 269885
|
| |
|
|
|
|
| |
Seems to have broken the Windows ASan bot. Reverting while investigating.
llvm-svn: 269833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves the expansion of WIN_ALLOCA pseudo-instructions
into a separate pass that walks the CFG and lowers the instructions
based on a conservative estimate of the offset between the stack
pointer and the lowest accessed stack address.
The goal is to reduce binary size and run-time costs by removing
calls to _chkstk. While it doesn't fix all the code quality problems
with inalloca calls, it's an incremental improvement for PR27076.
Differential Revision: http://reviews.llvm.org/D20263
llvm-svn: 269828
|
| |
|
|
|
|
|
|
|
| |
Since r207518 they are printed exactly like non-hidden stubs on x86 and
since r207517 on ARM.
This means we can use a single set for all stubs in those platforms.
llvm-svn: 269776
|
| |
|
|
|
|
|
|
| |
target block are the same in getFallThroughMBB.
Differential Revision: http://reviews.llvm.org/D20288
llvm-svn: 269760
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new X86 shuffle lowering can do just fine without transforming vselects
into vector_shuffles. It looks like the only thing this code does right now
is cause trouble - in particular, it can lead to combine/legalization infinite
loops.
Note that it's not completely NFC, since some of the shuffle masks get inverted,
which may cause slight differences further down the line. We may want to find
a way to invert those masks, but that's orthogonal to this commit.
This fixes the hang in PR27689.
llvm-svn: 269676
|
| |
|
|
| |
llvm-svn: 269650
|
| |
|
|
|
|
|
|
|
|
| |
This patch uses PSHUFB to lower vector CTLZ and avoid (slower) scalarizations.
The leading zero count of each 4-bit nibble of the vector is determined by using a PSHUFB lookup. Pairs of results are then repeatedly combined up to the original element width.
Differential Revision: http://reviews.llvm.org/D20016
llvm-svn: 269646
|
| |
|
|
| |
llvm-svn: 269615
|
| |
|
|
|
|
| |
Removed duplicate getOperand / getSimpleValueType calls.
llvm-svn: 269614
|
| |
|
|
|
|
| |
software spec.
llvm-svn: 269579
|
| |
|
|
|
|
| |
Differential revision http://reviews.llvm.org/D19261
llvm-svn: 269569
|
| |
|
|
|
|
|
|
| |
argument and the mask is the 4th argument. Also move the 128/256 tests to the right test file.
Prior to this the immediate was a strange 16-bits and the 512-bit intrinsic couldn't receive the full 16 mask bits it needs.
llvm-svn: 269526
|
| |
|
|
|
|
| |
H.J. Lu pointed out that I missed this in r269236. Thanks!
llvm-svn: 269516
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D18725
llvm-svn: 269413
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
32-bit AllRegs:
SSE: xmm0-xmm7
AVX: ymm0-ymm7
AVX512: zmm0-zmm7 + k0-k7
64-bit AllRegs:
SSE: xmm0-xmm15
AVX: ymm0-ymm15
AVX512: zmm0-zmm31 + k0-k7
Differential Revision: http://reviews.llvm.org/D20142
llvm-svn: 269337
|