summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/avx512bw-intrinsics-fast-isel.ll
Commit message (Collapse)AuthorAgeFilesLines
* [X86][AVX512] Cleanup intrinsics testsSimon Pilgrim2018-06-031-331/+267
| | | | | | | | Ensure we test on 32-bit and 64-bit targets, and strip -mcpu usage. Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible. llvm-svn: 333843
* [X86] Add unmasked vermi2var intrinsics so we can use explicit select ↵Craig Topper2018-05-291-0/+92
| | | | | | | | instructions for masking in clang. This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch. llvm-svn: 333386
* [X86] Add fast isel tests for some of the avx512 truncate intrinsics to ↵Craig Topper2018-05-151-0/+60
| | | | | | match current clang codegen. llvm-svn: 332326
* Correct dwarf unwind information in function epiloguePetar Jovanovic2018-04-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: * CFI instructions do not affect code generation (they are not counted as instructions when tail duplicating or tail merging) * Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Added CFIInstrInserter pass: * analyzes each basic block to determine cfa offset and register are valid at its entry and exit * verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors * inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D42848 llvm-svn: 330706
* [X86] Add DAG combine to turn (bitcast (and/or/xor (bitcast X), Y)) -> ↵Craig Topper2018-02-041-16/+12
| | | | | | | | | | (and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations. This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions. There's still some room for improvement to remove more transitions, but this is a good start. llvm-svn: 324184
* [X86] Legalize (v64i1 (bitcast (i64 X))) on 32-bit targets by extracting ↵Craig Topper2018-02-021-2116/+22
| | | | | | | | 32-bit halves from i32, bitcasting each to v32i1, and concatenating. This prevents the scalarization that would otherwise occur. llvm-svn: 324057
* [X86] Legalize (i64 (bitcast (v64i1 X))) on 32-bit targets by extracting to ↵Craig Topper2018-02-021-61/+30
| | | | | | | | v32i1 and bitcasting to i32. This saves a trip through memory and seems to open up other combining opportunities. llvm-svn: 324056
* [MachineCopyPropagation] Extend pass to do COPY source forwardingGeoff Berry2018-02-011-16/+16
| | | | | | | | | | | | | | | | | | | | | | Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991
* [X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and ↵Craig Topper2018-01-231-248/+248
| | | | | | | | | | inserting into a vXi1 vector. The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code. This patch is a little larger than it should be due to differences between the DQI handling between the two today. llvm-svn: 323212
* [X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a ↵Craig Topper2018-01-231-744/+724
| | | | | | | | | | | | | | | | | | | movzx Summary: If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding. This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42313 llvm-svn: 323175
* [X86] Autoupgrade kunpck intrinsics using vector operations instead of ↵Craig Topper2018-01-141-40/+28
| | | | | | | | | | | | | | | | scalar operations Summary: This patch changes the kunpck intrinsic autoupgrade to use vXi1 shufflevector operations to perform vector extracts and concats. This more closely matches the definition of the kunpck instructions. Currently we rely on a DAG combine to turn the scalar shift/and/or code into a concat vectors operation. By doing it in the IR we get this for free. Reviewers: spatel, RKSimon, zvi, jina.nahias Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42018 llvm-svn: 322462
* [X86] Handle alls version of vXi1 insert_vector_elt with a constant index ↵Craig Topper2017-12-081-2510/+1805
| | | | | | | | | | | | without falling back to shuffles. We previously only supported inserting to the LSB or MSB where it was easy to zero to perform an OR to insert. This change effectively extracts the old value and the new value, xors them together and then xors that single bit with the correct location in the original vector. This will cancel out the old value in the first xor leaving the new value in the position. The way I've implemented this uses 3 shifts and two xors and uses an additional register. We can avoid the additional register at the cost of another shift. llvm-svn: 320120
* [x86][AVX512] Lowering kunpack intrinsics to LLVM IRJina Nahias2017-12-051-8/+119
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D39719), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39720 Change-Id: I4088d9428478f9457f6afddc90bd3d66b3daf0a1 llvm-svn: 319778
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-64/+64
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [X86] test/testn intrinsics lowering to IR. llvm part.Uriel Korach2017-11-131-0/+1650
| | | | | | | | | Remove builtins from llvm and add AutoUpgrade support. Also add fast-isel tests for the TEST and TESTN instructions. Differential Revision: https://reviews.llvm.org/D38736 llvm-svn: 318036
* Revert "Correct dwarf unwind information in function epilogue for X86"Reid Kleckner2017-11-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts r317579, originally committed as r317100. There is a design issue with marking CFI instructions duplicatable. Not all targets support the CFIInstrInserter pass, and targets like Darwin can't cope with duplicated prologue setup CFI instructions. The compact unwind info emission fails. When the following code is compiled for arm64 on Mac at -O3, the CFI instructions end up getting tail duplicated, which causes compact unwind info emission to fail: int a, c, d, e, f, g, h, i, j, k, l, m; void n(int o, int *b) { if (g) f = 0; for (; f < o; f++) { m = a; if (l > j * k > i) j = i = k = d; h = b[c] - e; } } We get assembly that looks like this: ; BB#1: ; %if.then Lloh3: adrp x9, _f@GOTPAGE Lloh4: ldr x9, [x9, _f@GOTPAGEOFF] mov w8, wzr Lloh5: str wzr, [x9] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.lt LBB0_3 b LBB0_7 LBB0_2: ; %entry.if.end_crit_edge Lloh6: adrp x8, _f@GOTPAGE Lloh7: ldr x8, [x8, _f@GOTPAGEOFF] Lloh8: ldr w8, [x8] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.ge LBB0_7 LBB0_3: ; %for.body.lr.ph Note the multiple .cfi_def* directives. Compact unwind info emission can't handle that. llvm-svn: 317726
* Reland "Correct dwarf unwind information in function epilogue for X86"Petar Jovanovic2017-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reland r317100 with minor fix regarding ComputeCommonTailLength function in BranchFolding.cpp. Skipping top CFI instructions block needs to executed on several more return points in ComputeCommonTailLength(). Original r317100 message: "Correct dwarf unwind information in function epilogue for X86" This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. llvm-svn: 317579
* Revert "Correct dwarf unwind information in function epilogue for X86"Petar Jovanovic2017-11-011-2/+0
| | | | | | | This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf buildbot failure (build #15606). llvm-svn: 317136
* Correct dwarf unwind information in function epilogue for X86Petar Jovanovic2017-11-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D35844 llvm-svn: 317100
* [X86] Add a pass to convert instruction chains between domains.Guy Blank2017-10-221-16/+8
| | | | | | | | | | | | | | | | | The pass scans the function to find instruction chains that define registers in the same domain (closures). It then calculates the cost of converting the closure to another domain. If found profitable, the instructions are converted to instructions in the other domain and the register classes are changed accordingly. This commit adds the pass infrastructure and a simple conversion from the GPR domain to the Mask domain. Differential Revision: https://reviews.llvm.org/D37251 Change-Id: Ic2cf1d76598110401168326d411128ae2580a604 llvm-svn: 316288
* [MC] Suppress .Lcfi labels when emitting textual assemblyReid Kleckner2017-10-101-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This suppresses the generation of .Lcfi labels in our textual assembler. It was annoying that this generated cascading .Lcfi labels: llc foo.ll -o - | llvm-mc | llvm-mc After three trips through MCAsmStreamer, we'd have three labels in the output when none are necessary. We should only bother creating the labels and frame data when making a real object file. This supercedes D38605, which moved the entire .seh_ implementation into MCObjectStreamer. This has the advantage that we do more checking when emitting textual assembly, as a minor efficiency cost. Outputting textual assembly is not performance critical, so this shouldn't matter. Reviewers: majnemer, MatzeB Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D38638 llvm-svn: 315259
* [x86] swap order of srl (and X, C1), C2 when it saves sizeSanjay Patel2017-09-221-12/+12
| | | | | | | | | | | | | | The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81. There are also better improvements based on known-bits that allow us to eliminate the mask entirely. As noted, this could be extended. There are potentially other wins from always shifting first, but doing that reveals a tangle of problems in other pattern matching. We do this transform generically in instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this in the backend. Differential Revision: https://reviews.llvm.org/D38181 llvm-svn: 314023
* [x86] Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-191-0/+1507
| | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625
* [X86][AVX512BW] Added BROADCAST intrinsics fast-isel generic IR testsSimon Pilgrim2016-07-051-0/+120
| | | | llvm-svn: 274545
* [X86][AVX512] Add support for UNPCK masked shuffle commentsSimon Pilgrim2016-07-031-16/+16
| | | | llvm-svn: 274464
* [X86][AVX512] Add fast-isel shuffle testsSimon Pilgrim2016-07-021-0/+293
Its not worth trying to write out tests for all the avx512f builtins yet, just adding tests for lowering of generic IR as we transition to it (shuffles mainly right now). llvm-svn: 274434
OpenPOWER on IntegriCloud