| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Currently this is to mainly to prevent scalarization of integer division by constants.
Differential Revision: http://reviews.llvm.org/D18307
llvm-svn: 264511
|
| |
|
|
|
|
|
|
| |
multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.
llvm-svn: 264509
|
| |
|
|
|
|
| |
LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead.
llvm-svn: 264506
|
| |
|
|
|
|
|
|
|
| |
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.
This fixes PR27071.
llvm-svn: 264465
|
| |
|
|
|
|
|
|
|
|
|
|
| |
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.
Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.
Differential Revision: http://reviews.llvm.org/D18374
llvm-svn: 264440
|
| |
|
|
|
|
| |
LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith.
llvm-svn: 264403
|
| |
|
|
| |
llvm-svn: 264395
|
| |
|
|
|
|
|
|
|
| |
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).
Differential Revision: http://reviews.llvm.org/D18246
llvm-svn: 264375
|
| |
|
|
|
|
| |
Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky
llvm-svn: 264305
|
| |
|
|
|
|
|
|
|
|
|
|
| |
KTEST instruction may be used instead of TEST in this case:
%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1
Differential Revision: http://reviews.llvm.org/D18444
llvm-svn: 264298
|
| |
|
|
| |
llvm-svn: 264294
|
| |
|
|
|
|
|
|
| |
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.
Differential Revision: http://reviews.llvm.org/D18189
llvm-svn: 264260
|
| |
|
|
|
|
|
|
|
|
| |
We need the "return address" of a noreturn call to be within the
bounds of the calling function; TrapUnreachable turns 'unreachable'
into a 'ud2' instruction, which has that desired effect.
Differential Revision: http://reviews.llvm.org/D18414
llvm-svn: 264224
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:
jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...
AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:
jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...
However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.
Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.
In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.
Differential Revision: http://reviews.llvm.org/D11393
llvm-svn: 264199
|
| |
|
|
|
|
|
|
|
|
| |
This should be hoisted further up so it can be used in DAGCombiner and other backends,
but I'm limiting the scope in the interest of patch minimalism.
It's not quite NFC because some of the replaced code was using an 'if' check rather
than a 'while' loop, so those cases would only look through a single bitcast.
llvm-svn: 264186
|
| |
|
|
|
|
|
|
| |
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.
Differential Revision: http://reviews.llvm.org/D13979
llvm-svn: 264148
|
| |
|
|
| |
llvm-svn: 264110
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.
We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.
Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).
Reapplied with a fix for PR26953 (missing vector widening legalization).
Differential Revision: http://reviews.llvm.org/D17932
llvm-svn: 264062
|
| |
|
|
|
|
|
|
| |
computeZeroableShuffleElements
Based on feedback for D14261
llvm-svn: 263911
|
| |
|
|
|
|
|
|
| |
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.
Differential Revision: http://reviews.llvm.org/D14261
llvm-svn: 263906
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D18211
llvm-svn: 263898
|
| |
|
|
| |
llvm-svn: 263889
|
| |
|
|
|
|
|
| |
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.
llvm-svn: 263856
|
| |
|
|
|
|
|
|
| |
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines
This patch stops the combine if we already have a blend in place (means we miss some domain corrections)
llvm-svn: 263717
|
| |
|
|
| |
llvm-svn: 263646
|
| |
|
|
|
|
|
|
| |
512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204
llvm-svn: 263624
|
| |
|
|
|
|
|
|
|
| |
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.
This reverts commit r263303.
llvm-svn: 263512
|
| |
|
|
| |
llvm-svn: 263453
|
| |
|
|
| |
llvm-svn: 263448
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.
1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.
Differential Revision: http://reviews.llvm.org/D18094
llvm-svn: 263446
|
| |
|
|
|
|
|
|
|
|
| |
instruction on SKX.
implemented by delena
Differential Revision: http://reviews.llvm.org/D18054
llvm-svn: 263417
|
| |
|
|
|
|
| |
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.
llvm-svn: 263383
|
| |
|
|
|
|
| |
These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs.
llvm-svn: 263354
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer.
Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).
Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.
To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
comparison.
- The other is the virtual register holding the save of the value of RBX as the
base pointer. This saving is done as part of isel (i.e., we emit a copy from
rbx).
cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp
This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp
Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.
This fixes PR26883.
llvm-svn: 263325
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.
We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.
Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).
Differential Revision: http://reviews.llvm.org/D17932
llvm-svn: 263303
|
| |
|
|
| |
llvm-svn: 263266
|
| |
|
|
|
|
|
|
| |
combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.
llvm-svn: 263239
|
| |
|
|
|
|
|
|
| |
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.
llvm-svn: 263167
|
| |
|
|
|
|
|
|
|
|
|
|
| |
ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 263159
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.
This fixes PR26711.
Differential Revision: http://reviews.llvm.org/D18029
llvm-svn: 263139
|
| |
|
|
|
|
|
|
| |
runs successfully on routines containing IRETs. This fixes PR26410.
Differential Revision: http://reviews.llvm.org/D17643
llvm-svn: 263120
|
| |
|
|
|
|
|
|
| |
(failed on instruction selection phase)
Differential Revision: http://reviews.llvm.org/D17924
llvm-svn: 263111
|
| |
|
|
|
|
|
|
| |
The BLEND+zero combine was failing to combine equivalent BLEND masks.
Follow up to D17483 and D17858
llvm-svn: 263105
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.
This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.
There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.
Differential Revision: http://reviews.llvm.org/D17858
llvm-svn: 263102
|
| |
|
|
|
|
|
|
| |
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".
Differential Revision: http://reviews.llvm.org/D17994
llvm-svn: 263097
|
| |
|
|
|
|
|
|
|
|
|
| |
The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.
Differential Revision: http://reviews.llvm.org/D18000
llvm-svn: 263069
|
| |
|
|
|
|
|
|
| |
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.
Differential Revision: http://reviews.llvm.org/D17899
llvm-svn: 263067
|
| |
|
|
|
|
| |
http://reviews.llvm.org/D17967
llvm-svn: 263021
|
| |
|
|
|
|
|
|
| |
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix!
llvm-svn: 262943
|
| |
|
|
|
|
|
|
| |
ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.
llvm-svn: 262935
|