[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4): - bcm5719-llvm

diff options

author	Michael Zuckerman <Michael.zuckerman@intel.com>	2017-09-25 14:50:38 +0000
committer	Michael Zuckerman <Michael.zuckerman@intel.com>	2017-09-25 14:50:38 +0000
commit	4a97df01c43170eed51d2cca0dc779d9d2ca6419 (patch)
tree	de4d3f69196e6feb059f4eef0c09f0ee4c211a0e /lldb/packages/Python/lldbsuite/test/expression_command/macros/TestMacros.py
parent	31fef4d3f08a15361c0ef9c0fabba5ba083f85ae (diff)
download	bcm5719-llvm-4a97df01c43170eed51d2cca0dc779d9d2ca6419.tar.gz bcm5719-llvm-4a97df01c43170eed51d2cca0dc779d9d2ca6419.zip

[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4):

This patch expands the support of lowerInterleavedStore to 8x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding each 8 chars: c0, c1, , c7 m0, m1, , m7 y0, y1, , y7 k0, k1, ., k7 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers DavidKreitzer Farhana zvi igorb guyblank RKSimon Ayal Differential Revision: https://reviews.llvm.org/D36058 Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32 llvm-svn: 314109

Diffstat (limited to 'lldb/packages/Python/lldbsuite/test/expression_command/macros/TestMacros.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: