| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
This variant is (as documented in the TD) for disassembler use only, and should
not be used in patterns - it is longer, and is broken on 64-bit.
llvm-svn: 276347
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when constraint "w" is used on a 32-bit operand.
This enables compiling the following code, which used to error out in
the backend:
void foo1(int a) {
asm volatile ("sqxtn h0, %s0\n" : : "w"(a):);
}
Fixes PR28633.
llvm-svn: 276344
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D22538
llvm-svn: 276298
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D21646
llvm-svn: 276294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
limits.
Summary:
This change also changes findMatchingInsn and
findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account
when tracking register defs and uses, which could potentially inhibit
these optimizations in the presence of debug information.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22582
llvm-svn: 276293
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW.
But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register.
This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41).
Fix for PR27265.
Differential Revision: https://reviews.llvm.org/D22509
llvm-svn: 276289
|
| |
|
|
|
|
| |
As requested on D22509, I've pulled out the v8i16 extraction lowering as the SSE41 and pre-SSE41 implementations are effectively the same.
llvm-svn: 276285
|
| |
|
|
|
|
|
|
|
|
|
|
| |
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Differential Revision: https://reviews.llvm.org/D22460
llvm-svn: 276281
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: tstellarAMD, vpykhtin
Subscribers: arsenm, kzhuravl
Differential Revision: https://reviews.llvm.org/D22620
llvm-svn: 276274
|
| |
|
|
| |
llvm-svn: 276257
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
classifyLEAReg() deals with switching operands from 32bit to 64bit in
order to use a LEA64_32 instruction (for three address code goodness).
It currently performs a liveness analysis to determine the kill/undef
flag for the newly added operand. This should not be necessary:
- If the previous operand had a kill flag, then the 32bit part of the
register gets killed, this will kill the super register as well.
- If the previous operand had an undef flag then we didn't care what
value we read, just use the same flag on the new operand.
(No matter what an operand with an undef flag won't affect liveness)
This makes the code independent of the presence of kill flags because it
avoids a call to MachineBasicBlock::computeRegisterLiveness().
Differential Revision: http://reviews.llvm.org/D22283
llvm-svn: 276222
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: tra
Subscribers: jholewinski, arsenm, asbirlea
Differential Revision: https://reviews.llvm.org/D22592
llvm-svn: 276196
|
| |
|
|
|
|
| |
-run-pass. NFCI.
llvm-svn: 276193
|
| |
|
|
|
|
|
|
| |
After r276153 the pass applies to both kernels and regular functions.
Differential Revision: https://reviews.llvm.org/D22583
llvm-svn: 276189
|
| |
|
|
|
|
|
|
|
|
|
| |
At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward
to select it in fast-isel, and let the pseudo be expanded later.
extractvalues on the result are the tricky part: the generic logic
only works for legal types (and it would be painful to make it
support illegal types), so we can only support i32/i64 cmpxchg.
llvm-svn: 276183
|
| |
|
|
| |
llvm-svn: 276182
|
| |
|
|
|
|
|
|
| |
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).
llvm-svn: 276158
|
| |
|
|
|
|
|
|
| |
Fixes a crash in llvm_unreachable when a function has array return type.
Differential Revision: https://reviews.llvm.org/D22524
llvm-svn: 276154
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Avoid unnecessary spills of byval arguments of device functions to
local space on SASS level and subsequent pointer conversion to generic
address space that follows. Instead, make a local copy in IR, provide
a way to access arguments directly, and let LLVM optimize the copy away
when possible.
Differential Review: https://reviews.llvm.org/D21421
llvm-svn: 276153
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D22526
llvm-svn: 276119
|
| |
|
|
|
|
|
|
| |
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.
Differential Revision: https://reviews.llvm.org/D22456
llvm-svn: 276104
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Retry r275776 (no changes, we suspect the issue was with another commit).
The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.
The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.
Fixes PR26038
Differential Revision: https://reviews.llvm.org/D22103
llvm-svn: 276101
|
| |
|
|
|
|
| |
pattern.
llvm-svn: 276088
|
| |
|
|
|
|
| |
without relying on file ordering.
llvm-svn: 276087
|
| |
|
|
|
|
| |
VEXTRACT(F/I)128/VINSERT(I/F)128. NFC
llvm-svn: 276086
|
| |
|
|
|
|
| |
instructions with less repeated code. NFC
llvm-svn: 276085
|
| |
|
|
|
|
|
|
|
| |
Inference of the 'returned' attribute was fixed in r276008, lets try
turning the backend support back on.
This reverts commit r275677.
llvm-svn: 276081
|
| |
|
|
|
|
|
|
|
|
|
| |
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.
Simplify the lowering tests by using per function
subtarget features.
llvm-svn: 276051
|
| |
|
|
|
|
|
|
| |
Add check for legal data types when expanding into a Newton series.
Differential Revision: https://reviews.llvm.org/D22267
llvm-svn: 276041
|
| |
|
|
| |
llvm-svn: 276030
|
| |
|
|
|
|
| |
LGTM'd by Matt Arsenault.
llvm-svn: 276029
|
| |
|
|
|
|
|
|
|
|
|
| |
There's not much functional change, but it really is an architectural feature
(on v6T2, v7A, v7R and v7EM) rather than something each CPU implements
individually.
The main functional change is the default behaviour you get when specifying
only "-triple".
llvm-svn: 276013
|
| |
|
|
|
|
|
|
|
|
|
| |
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.
These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.
llvm-svn: 275988
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
A companion clang patch is at D22105
Differential Revision: https://reviews.llvm.org/D22106
llvm-svn: 275981
|
| |
|
|
|
|
|
|
|
|
|
| |
Recommitting after r274347 was reverted. This patch introduces some
classes to refactor the 3 and 4 register Thumb2 multiplication
instruction descriptions, plus improved tests for some of those
instructions.
Differential Revision: https://reviews.llvm.org/D21929
llvm-svn: 275979
|
| |
|
|
|
|
|
|
| |
Adding PredictableSelectIsExpensive for Vulcan
Differential Revision: https://reviews.llvm.org/D22448
llvm-svn: 275978
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The standard local dynamic model for TLS on ARM systems needs two
relocations:
- R_ARM_TLS_LDM32 (module idx)
- R_ARM_TLS_LDO32 (offset of object from origin of module TLS block)
In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to
produce these relocations.
llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm).
This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32.
TLS for ARM is defined in Addenda to, and Errata in, the ABI for the
ARM Architecture
Differential Revision: https://reviews.llvm.org/D22461
llvm-svn: 275977
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22458
llvm-svn: 275968
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).
This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D22412
llvm-svn: 275967
|
| |
|
|
|
|
|
|
|
|
|
| |
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).
I added the new sequence (truncate (a >>n) to i1) to the BT pattern.
Differential Revision: https://reviews.llvm.org/D22354
llvm-svn: 275950
|
| |
|
|
| |
llvm-svn: 275942
|
| |
|
|
|
|
| |
passed the same value.
llvm-svn: 275941
|
| |
|
|
| |
llvm-svn: 275939
|
| |
|
|
|
|
|
|
|
| |
Without this fix, releaseSuccessors when InOrOutBlock is
false could release SUs outside the schedule BasicBlock.
Patch by Axel Davy
llvm-svn: 275935
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.
llvm-svn: 275934
|
| |
|
|
|
|
|
| |
.. including calls from kernel functions that were
ignored by mistake before.
llvm-svn: 275920
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.
The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.
Differential Revision: https://reviews.llvm.org/D22428
llvm-svn: 275893
|
| |
|
|
|
|
|
|
| |
Breaks asan, see https://reviews.llvm.org/D22103
This reverts commit r275776.
llvm-svn: 275890
|
| |
|
|
|
|
| |
This is already casted above so non-null
llvm-svn: 275881
|
| |
|
|
| |
llvm-svn: 275873
|