summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/vector-shuffle-v1.ll
Commit message (Collapse)AuthorAgeFilesLines
* Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"Daniel Jasper2017-06-291-2/+0
| | | | | | | | | | I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676
* [X86] Correct dwarf unwind information in function epiloguePetar Jovanovic2017-06-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529
* [X86][SSE] Dropped -mcpu from vector shuffle testsSimon Pilgrim2017-06-211-4/+2
| | | | | | Use triple and attribute only for consistency llvm-svn: 305908
* [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registersCraig Topper2017-03-281-18/+32
| | | | | | | | | | | | | | | | We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-0/+22
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for ↵Craig Topper2017-01-091-42/+32
| | | | | | | | vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415
* MCStreamer: Use "cfi" for CFI related temp labels.Matthias Braun2016-11-301-3/+3
| | | | | | | | | | | | | Choosing a "cfi" name makes the intend a bit clearer in an assembly dump and more importantly the assembly dumps are slightly more stable as the numbers don't move around anymore when unrelated code calls createTempSymbol() more or less often. As they are temp labels the name doesn't influence the generated object code. Differential Revision: https://reviews.llvm.org/D27244 llvm-svn: 288290
* [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from ↵Craig Topper2016-11-221-18/+18
| | | | | | | | | | | | | | | | | VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621
* [AVX512] Fix insertelement i1 lowering.Igor Breger2016-08-141-28/+13
| | | | | | | | | 1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 ) 2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE. Differential Revision: http://reviews.llvm.org/D23347 llvm-svn: 278623
* AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 ↵Elena Demikhovsky2016-08-071-14/+0
| | | | | | | | | | | integer values Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV. It allows to suppress two opposite BITCAST operations and avoid redundant "movs". Differential Revision: https://reviews.llvm.org/D23247 llvm-svn: 277958
* [AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one ↵Craig Topper2016-07-111-41/+47
| | | | | | vectors. llvm-svn: 275045
* VirtRegMap: Replace some identity copies with KILL instructions.Matthias Braun2016-07-091-1/+48
| | | | | | | | | | | | | | An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952
* [X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR ↵Craig Topper2016-06-091-1/+1
| | | | | | instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. llvm-svn: 272249
* AVX-512: Load and Extended Load for i1 vectorsElena Demikhovsky2016-04-031-23/+7
| | | | | | | | | | Implemented load+{sign|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259
* [SelectionDAG] change getConstant() to use the input SDLoc when building ↵Sanjay Patel2016-02-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | splat vectors The code change is simple enough: instead of attaching an anonymous SDLoc to splatted vector constants, use the scalar constant's existing SDLoc since that is what is passed into getConstant() as a param. But this changes instruction scheduling, so I'll explain why that happens. The motivation for this patch starts near: http://reviews.llvm.org/rL258833 ...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'. But when I made that change locally, several x86 codegen tests wiggled. It turns out that the lack of SDLoc consistency in getConstant() changes the way ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG scheduler algorithms use IROrder for tie-breaking. Differential Revision: http://reviews.llvm.org/D16972 llvm-svn: 260582
* AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.Igor Breger2016-02-071-27/+27
| | | | | | Differential Revision: http://reviews.llvm.org/D16813 llvm-svn: 260024
* AVX512: Fix truncate v32i8 to v32i1 lowering implementation.Igor Breger2016-01-281-16/+16
| | | | | | | | Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions. Differential Revision: http://reviews.llvm.org/D16531 llvm-svn: 259044
* AVX512: Masked move intrinsic implementation.Igor Breger2016-01-211-9/+9
| | | | | | | | Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD. Differential Revision: http://reviews.llvm.org/D16316 llvm-svn: 258398
* [X86][AVX2] Broadcast subvectorsSimon Pilgrim2016-01-181-4/+4
| | | | | | | | AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081
* AVX512 : Change v8i1 bitconvert GR8 pattern, remove unnecessary movzbl ↵Igor Breger2016-01-181-13/+6
| | | | | | | | | | | | | | instruction. code example , previous implementation. movzbl %dil, %eax kmovw %eax, %k0 new code kmovw %edi, %k0 Differential Revision: http://reviews.llvm.org/D16287 llvm-svn: 258045
* [X86][AVX512] Regenerate v1 shuffle testsSimon Pilgrim2016-01-171-2/+2
| | | | llvm-svn: 258013
* AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.Igor Breger2015-12-271-18/+58
| | | | | | | | | Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470
* AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( ↵Igor Breger2015-10-151-4/+2
| | | | | | | | shuffle packed values at 128-bit granularity ) Differential Revision: http://reviews.llvm.org/D13648 llvm-svn: 250400
* AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1>Elena Demikhovsky2015-09-171-0/+401
AVX-512 does not provide an instruction that shuffles mask register. So I do the following way: mask-2-simd , shuffle simd , simd-2-mask Differential Revision: http://reviews.llvm.org/D12727 llvm-svn: 247876
OpenPOWER on IntegriCloud