| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
and change all the code that used to create intrinsic nodes to create the new nodes instead.
llvm-svn: 148664
|
|
|
|
| |
llvm-svn: 148466
|
|
|
|
| |
llvm-svn: 147394
|
|
|
|
| |
llvm-svn: 146833
|
|
|
|
|
|
| |
hanging around. Also remove a cast from inside getShuffleVPERM2X128Immediate and getShuffleVPERMILPImmediate since the only caller already had done the cast.
llvm-svn: 146344
|
|
|
|
| |
llvm-svn: 145926
|
|
|
|
| |
llvm-svn: 145485
|
|
|
|
|
|
| |
type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128.
llvm-svn: 145483
|
|
|
|
|
|
| |
VPERMILPS/VPERMILPD detection since they are pretty similar.
llvm-svn: 145238
|
|
|
|
|
|
| |
Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created.
llvm-svn: 145153
|
|
|
|
|
|
| |
not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type.
llvm-svn: 145148
|
|
|
|
|
|
| |
128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.
llvm-svn: 145126
|
|
|
|
|
|
| |
the 128-bit versions and let the vector type distinguish.
llvm-svn: 145125
|
|
|
|
| |
llvm-svn: 145028
|
|
|
|
|
|
| |
AVX2 is enabled.
llvm-svn: 145026
|
|
|
|
|
|
| |
add/sub of appropriate shuffle vectors.
llvm-svn: 144989
|
|
|
|
| |
llvm-svn: 144988
|
|
|
|
| |
llvm-svn: 144987
|
|
|
|
| |
llvm-svn: 143529
|
|
|
|
|
|
|
| |
floating point add/sub of appropriate shuffle vectors. Does not
synthesize the 256 bit AVX versions because they work differently.
llvm-svn: 140332
|
|
|
|
|
|
|
|
| |
more strict about the alignment checking. This was found by inspection
and I don't have any testcases so far, although the llvm testsuite runs
without any problem.
llvm-svn: 139625
|
|
|
|
| |
llvm-svn: 139491
|
|
|
|
|
|
| |
implementation to have tablegen match the instruction by the node type
llvm-svn: 139400
|
|
|
|
|
|
|
|
|
|
|
|
| |
in Nadav's r139285 and r139287 commits.
1) Rename vsel.ll to a more descriptive name
2) Change the order of BLEND operands to "Op1, Op2, Cond", this is
necessary because PBLENDVB is already used in different places with
this order, and it was being emitted in the wrong way for vselect
3) Add AVX patterns and tests for the same SSE41 instructions
llvm-svn: 139305
|
|
|
|
| |
llvm-svn: 139285
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.
llvm-svn: 137810
|
|
|
|
|
|
|
|
| |
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
llvm-svn: 137519
|
|
|
|
|
|
|
| |
Also make PALIGNR masks to don't match 256-bits, which isn't supported
It's also a step to solve PR10489
llvm-svn: 136448
|
|
|
|
|
|
|
|
|
| |
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.
llvm-svn: 136200
|
|
|
|
| |
llvm-svn: 136199
|
|
|
|
|
|
|
| |
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases
llvm-svn: 136157
|
|
|
|
|
|
| |
27 insertions(+), 62 deletions(-)
llvm-svn: 136047
|
|
|
|
|
|
|
|
|
|
| |
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.
llvm-svn: 136002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:
Instead of:
vextractf128 $0, %ymm0, %xmm0
punpcklbw %xmm0, %xmm0
punpckhbw %xmm0, %xmm0
vinsertf128 $0, %xmm0, %ymm0, %ymm1
vinsertf128 $1, %xmm0, %ymm1, %ymm0
vextractf128 $1, %ymm0, %xmm1
shufps $1, %xmm1, %xmm1
movss %xmm1, 28(%rsp)
movss %xmm1, 24(%rsp)
movss %xmm1, 20(%rsp)
movss %xmm1, 16(%rsp)
vextractf128 $0, %ymm0, %xmm0
shufps $1, %xmm0, %xmm0
movss %xmm0, 12(%rsp)
movss %xmm0, 8(%rsp)
movss %xmm0, 4(%rsp)
movss %xmm0, (%rsp)
vmovaps (%rsp), %ymm0
We get:
vextractf128 $0, %ymm0, %xmm0
punpcklbw %xmm0, %xmm0
punpckhbw %xmm0, %xmm0
vinsertf128 $0, %xmm0, %ymm0, %ymm1
vinsertf128 $1, %xmm0, %ymm1, %ymm0
vpermilps $85, %ymm0, %ymm0
llvm-svn: 135662
|
|
|
|
| |
llvm-svn: 135198
|
|
|
|
|
|
|
| |
general version of X86ISD::ANDNP also opened the room for a little bit
of refactoring.
llvm-svn: 135088
|
|
|
|
|
|
|
| |
it's later selected to a ANDNPD/ANDNPS instruction instead of the PANDN
instruction. Rename it.
llvm-svn: 135087
|
|
|
|
|
|
| |
vxorps, vxorpd
llvm-svn: 135023
|
|
|
|
|
|
| |
rdar://problem/5993888
llvm-svn: 132606
|
|
|
|
| |
llvm-svn: 132479
|
|
|
|
| |
llvm-svn: 132424
|
|
|
|
| |
llvm-svn: 132419
|
|
|
|
|
|
|
|
| |
floating-point comparison, generate a mask of 0s or 1s, and generally
DTRT with NaNs. Only profitable when the user wants a materialized 0
or 1 at runtime. rdar://problem/5993888
llvm-svn: 132404
|
|
|
|
|
|
| |
patch to TargetLowering.cpp. rdar://problem/5660695
llvm-svn: 132388
|
|
|
|
|
|
|
|
|
|
| |
missing patterns for them.
Add a SIMD test subdirectory to hold tests for SIMD instruction
selection correctness and quality.
'
llvm-svn: 126845
|
|
|
|
|
|
|
| |
infrastructure. This makes lowering 256-bit vectors to 128-bit
vectors simple when 256-bit vector support is not available.
llvm-svn: 124868
|
|
|
|
|
|
|
|
|
|
| |
matching EXTRACT_SUBVECTOR to VEXTRACTF128 along with support routines
to examine and translate index values. VINSERTF128 comes next. With
these two in place we can begin supporting more AVX operations as
INSERT/EXTRACT can be used as a fallback when 256-bit support is not
available.
llvm-svn: 124797
|
|
|
|
|
|
|
|
| |
addition to being an intrinsic, and convert
lowering to use it. Hopefully the pattern fragment is doing the right thing with XMM0, looks correct in testing.
llvm-svn: 122277
|
|
|
|
|
|
| |
Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts
llvm-svn: 122098
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
llvm-svn: 115243
|