| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D13945
llvm-svn: 251018
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D13769
llvm-svn: 250650
|
|
|
|
|
|
| |
Added X86ISD opcodes for VPROT vector rotate by variable and by immediate.
llvm-svn: 250620
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D13768
llvm-svn: 250396
|
|
|
|
| |
llvm-svn: 250174
|
|
|
|
|
|
|
|
| |
comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).
llvm-svn: 249976
|
|
|
|
|
|
|
|
|
|
|
|
| |
shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.
Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.
Differential Revision: http://reviews.llvm.org/D8690
llvm-svn: 248878
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12524
llvm-svn: 248147
|
|
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D12593
llvm-svn: 248121
|
|
|
|
|
|
|
|
|
|
| |
add scalar FP to Int conversion with truncation intrinsics
add scalar conversion FP32 from/to FP64 intrinsics
add rounding mode and SAE mode encoding for these intrinsics
Differential Revision: http://reviews.llvm.org/D12665
llvm-svn: 248117
|
|
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D12102
llvm-svn: 248116
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12931
llvm-svn: 248115
|
|
|
|
|
|
| |
To be used by other targets.
llvm-svn: 247225
|
|
|
|
|
|
|
|
|
|
| |
vpconflictq, vpconflictd
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11931
llvm-svn: 246750
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to accept (and even test, and generate) 16-byte alignment
for 32-byte nontemporal stores, but they require 32-byte alignment,
per SDM. Found by inspection.
Instead of hardcoding 16 in the patfrag, check for natural alignment.
Also fix the autoupgrade and the various tests.
Also, use explicit -mattr instead of -mcpu: I stared at the output
several minutes wondering why I get 2x movntps for the unaligned
case (which is the ideal output, but needs some work: see FIXME),
until I remembered corei7-avx implies +slow-unaligned-mem-32.
llvm-svn: 246733
|
|
|
|
|
|
|
|
| |
We can chain other fragments to avoid repeating conditions.
This also fixes a potential bug (that realistically can't happen),
where we would match indexed nontemporal stores for i32/i64.
llvm-svn: 246719
|
|
|
|
|
|
|
|
|
|
| |
instructions
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11593
llvm-svn: 246642
|
|
|
|
|
|
|
|
| |
Added tests for encoding.
Differential Revision: http://reviews.llvm.org/D11979
llvm-svn: 246439
|
|
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D12491
llvm-svn: 246436
|
|
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11528
llvm-svn: 243390
|
|
|
|
|
|
|
|
|
|
| |
Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic
Differential Revision: http://reviews.llvm.org/D11218
llvm-svn: 243122
|
|
|
|
|
|
|
|
|
|
| |
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.
It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.
llvm-svn: 242992
|
|
|
|
|
|
|
|
|
|
| |
Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic
Differential Revision: http://reviews.llvm.org/D11218
llvm-svn: 242990
|
|
|
|
|
|
|
|
| |
include encoding and intrinsics
Differential Revision: http://reviews.llvm.org/D11222
llvm-svn: 242896
|
|
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11351
llvm-svn: 242761
|
|
|
|
|
|
|
|
|
|
|
|
| |
types.
In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch.
I temporary removed the old intrinsics test (just to split this patch).
Half types are not covered here.
Differential Revision: http://reviews.llvm.org/D11134
llvm-svn: 242023
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.
As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.
From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.
Differential Revision: http://reviews.llvm.org/D10146
llvm-svn: 241508
|
|
|
|
|
|
|
|
|
|
|
|
| |
implementation
With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations.
This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code.
Differential Revision: http://reviews.llvm.org/D10947
llvm-svn: 241506
|
|
|
|
|
|
|
|
|
| |
pmulhrsw
review:
http://reviews.llvm.org/D10948
llvm-svn: 241443
|
|
|
|
|
|
| |
encoding, intrinsics and tests.
llvm-svn: 240936
|
|
|
|
|
|
|
|
|
|
|
| |
Add vscalef support
include encoding and intrinsics
review:
http://reviews.llvm.org/D10730
llvm-svn: 240906
|
|
|
|
|
|
|
| |
Added intrinsics.
Added encoding and tests.
llvm-svn: 240905
|
|
|
|
|
|
| |
Added all intrinsics, tests for encoding, tests for intrinsics.
llvm-svn: 240386
|
|
|
|
|
|
| |
encoding tests.
llvm-svn: 240272
|
|
|
|
|
|
|
|
|
|
| |
add instructions: VPAVGB and VPAVGW
review
http://reviews.llvm.org/D10504
llvm-svn: 240012
|
|
|
|
|
|
|
|
| |
This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions.
Differential Revision: http://reviews.llvm.org/D10433
llvm-svn: 239855
|
|
|
|
|
|
|
|
|
|
|
| |
for KNL.
Added intrinsics for cvtsi2ss/d instructions.
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D10430
llvm-svn: 239694
|
|
|
|
|
|
|
|
|
|
| |
AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.
CR:
http://reviews.llvm.org/D9991
llvm-svn: 238923
|
|
|
|
|
|
|
|
|
|
| |
for SKX and KNL.
Added tests for encoding.
By Igor Breger (igor.breger@intel.com)
llvm-svn: 238917
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....).
Since the refactoring of the shuffle lowering code this no longer has any use.
Differential Revision: http://reviews.llvm.org/D10169
llvm-svn: 238906
|
|
|
|
| |
llvm-svn: 238810
|
|
|
|
|
|
|
| |
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.
llvm-svn: 238809
|
|
|
|
|
|
|
|
|
| |
Implemented DAG lowering for all these forms.
Added tests for encoding.
By Igor Breger (igor.breger@intel.com)
llvm-svn: 238738
|
|
|
|
|
|
|
|
|
| |
Implemented DAG lowering for all these forms.
Added tests for encoding.
by Igor Breger (igor.breger@intel.com)
llvm-svn: 238728
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in-register LUT technique.
Summary:
A description of this technique can be found here:
http://wm.ite.pl/articles/sse-popcount.html
The core of the idea is to use an in-register lookup table and the
PSHUFB instruction to compute the population count for the low and high
nibbles of each byte, and then to use horizontal sums to aggregate these
into vector population counts with wider element types.
On x86 there is an instruction that will directly compute the horizontal
sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily.
Various tricks are used to get vNi32 and vNi16 from the vNi8 that the
LUT computes.
The base implemantion of this, and most of the work, was done by Bruno
in a follow up to D6531. See Bruno's detailed post there for lots of
timing information about these changes.
I have extended Bruno's patch in the following ways:
0) I committed the new tests with baseline sequences so this shows
a diff, and regenerated the tests using the update scripts.
1) Bruno had noticed and mentioned in IRC a redundant mask that
I removed.
2) I introduced a particular optimization for the i32 vector cases where
we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD
+ PSADBW to compute doubled high i32 popcounts. This takes advantage
of the fact that to line up the high i32 popcounts we have to shift
them anyways, and we can shift them by one fewer bit to effectively
divide the count by two. While the PSHUFD based horizontal add is no
faster, it doesn't require registers or load traffic the way a mask
would, and provides more ILP as it happens on different ports with
high throughput.
3) I did some code cleanups throughout to simplify the implementation
logic.
4) I refactored it to continue to use the parallel bitmath lowering when
SSSE3 is not available to preserve the performance of that version on
SSE2 targets where it is still much better than scalarizing as we'll
still do a bitmath implementation of popcount even in scalar code
there.
With #1 and #2 above, I analyzed the result in IACA for sandybridge,
ivybridge, and haswell. In every case I measured, the throughput is the
same or better using the LUT lowering, even v2i64 and v4i64, and even
compared with using the native popcnt instruction! The latency of the
LUT lowering is often higher than the latency of the scalarized popcnt
instruction sequence, but I think those latency measurements are deeply
misleading. Keeping the operation fully in the vector unit and having
many chances for increased throughput seems much more likely to win.
With this, we can lower every integer vector popcount implementation
using the LUT strategy if we have SSSE3 or better (and thus have
PSHUFB). I've updated the operation lowering to reflect this. This also
fixes an issue where we were scalarizing horribly some AVX lowerings.
Finally, there are some remaining cleanups. There is duplication between
the two techniques in how they perform the horizontal sum once the byte
population count is computed. I'm going to factor and merge those two in
a separate follow-up commit.
Differential Revision: http://reviews.llvm.org/D10084
llvm-svn: 238636
|
|
|
|
|
|
|
|
| |
instructions from this set
Added encoding tests.
llvm-svn: 237557
|
|
|
|
|
|
|
|
| |
{add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae
By Asaf Badouh (asaf.badouh@intel.com)
llvm-svn: 236971
|
|
|
|
|
|
|
|
| |
Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec.
By Igor Breger (igor.breger@intel.com)
llvm-svn: 236714
|
|
|
|
|
|
|
| |
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.
llvm-svn: 236420
|
|
|
|
|
|
|
|
| |
with intrinsics and tests
by Asaf Badouh (asaf.badouh@intel.com)
llvm-svn: 236418
|