summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/avx512-insert-extract.ll
Commit message (Collapse)AuthorAgeFilesLines
...
* [AVX512] Use 256-bit extract instructions for extracting bits [255:128] from ↵Craig Topper2017-08-301-10/+10
| | | | | | | | | | a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100
* [X86][Haswell] Updating HSW instruction scheduling informationGadi Haber2017-08-281-19/+28
| | | | | | | | | | | | | | | This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target. We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling. The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792. Information includes latency, number of micro-Ops and used ports by each HSW instruction. Please expect some performance fluctuations due to code alignment effects. Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud Differential Revision: https://reviews.llvm.org/D36663 llvm-svn: 311879
* [AVX512] Don't switch unmasked subvector insert/extract instructions when ↵Craig Topper2017-08-171-76/+36
| | | | | | | | | | | | | | AVX512DQI is enabled. There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences. There is however value in enabling the masked version of the instructions with DQI. This required introducing some new multiclasses to enabling this splitting. Differential Revision: https://reviews.llvm.org/D36661 llvm-svn: 311091
* X86: Do not use llc -march in tests.Matthias Braun2017-08-021-3/+3
| | | | | | | | | | | | | | | `llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. See also the discussion in https://reviews.llvm.org/D35287 llvm-svn: 309774
* [AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores ↵Craig Topper2017-08-011-38/+21
| | | | | | | | | | | | | | even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32. We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though. This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need. This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch. Differential Revision: https://reviews.llvm.org/D35978 llvm-svn: 309693
* [AVX512] Add a common prefix to avx512-insert-extract.ll so we can reduce ↵Craig Topper2017-07-311-361/+169
| | | | | | | | the number of check lines on some test cases. This was pointed out during the review for D313804. llvm-svn: 309629
* [AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer ↵Craig Topper2017-07-311-1/+1
| | | | | | | | | | | | | | vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-07-271-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
* [X86][AVX512] regenerate avx512-insert-extract.llGuy Blank2017-07-111-3/+3
| | | | llvm-svn: 307654
* [X86] Rerun "update_llc_test_checks" tool on CodeGen tests. NFC.Gadi Haber2017-07-021-0/+20
| | | | | | | | | | This is NFC after rerunning the "update_llc_test_checks.py" tool on the CodeGen X86 tests in order to submit a patch. Minor differences due to added "End of Function" lines. Reviewers: zvi Differential Revision: https://reviews.llvm.org/D34933 llvm-svn: 306973
* Reverting commit 306414 on behalf of @gadi.haberMichael Zuckerman2017-06-281-93/+73
| | | | llvm-svn: 306532
* Updated and extended the information about each instruction in HSW and SNB ↵Gadi Haber2017-06-271-73/+93
| | | | | | | | | | | | | | | | | | | to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414
* [X86][AVX512] Make i1 illegal in the CodeGenGuy Blank2017-05-191-120/+100
| | | | | | | | | | This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421
* [AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where ↵Craig Topper2017-03-291-13/+13
| | | | | | | | we can just use COPY_TO_REGCLASS instead. This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together. llvm-svn: 298984
* [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registersCraig Topper2017-03-281-8/+18
| | | | | | | | | | | | | | | | We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928
* [x86] don't blindly transform SETB into SBBSanjay Patel2017-03-121-73/+73
| | | | | | | | | | | | | | | | | | | | I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586
* [DAGCombine] Simplify ISD::AND in GetDemandedBits.Eli Friedman2017-03-081-7/+7
| | | | | | | | | This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 llvm-svn: 297249
* [x86] don't require a zext when forming ADC/SBBSanjay Patel2017-03-041-12/+12
| | | | | | | | | | | | | The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978
* [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)Sanjay Patel2017-03-041-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-0/+36
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [DAGCombiner] fold binops with constant into select-of-constantsSanjay Patel2017-03-011-10/+4
| | | | | | | | | | | | | | | | | | This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699
* [x86] add alternate IR tests for select of constants; NFCSanjay Patel2017-02-281-0/+68
| | | | llvm-svn: 296496
* [AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index.Igor Breger2017-02-211-0/+223
| | | | | | Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718
* [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.Igor Breger2017-02-201-94/+432
| | | | | | | | | | | | Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660
* [DAGCombiner] Remove the half vector width check for the combine of ↵Craig Topper2017-02-121-40/+40
| | | | | | | | EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR. This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors. llvm-svn: 294930
* [X86] Update test case I missed in r294876.Craig Topper2017-02-111-40/+39
| | | | llvm-svn: 294878
* Add new tests for EXTRACT_VECTOR_ELT (vector of packed i8/16/i32/i64/ps/pd data)Igor Breger2017-02-091-1/+427
| | | | llvm-svn: 294565
* [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for ↵Craig Topper2017-01-091-30/+26
| | | | | | | | vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415
* [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT ↵Craig Topper2017-01-031-16/+16
| | | | | | instructions. llvm-svn: 290869
* DAG: Fold out out of bounds insert_vector_eltMatt Arsenault2016-12-031-3/+3
| | | | | | | getNode already prevents formation of out of bounds constant extract_vector_elts. Do the same for insert_vector_elt. llvm-svn: 288603
* MCStreamer: Use "cfi" for CFI related temp labels.Matthias Braun2016-11-301-3/+3
| | | | | | | | | | | | | Choosing a "cfi" name makes the intend a bit clearer in an assembly dump and more importantly the assembly dumps are slightly more stable as the numbers don't move around anymore when unrelated code calls createTempSymbol() more or less often. As they are temp labels the name doesn't influence the generated object code. Differential Revision: https://reviews.llvm.org/D27244 llvm-svn: 288290
* [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from ↵Craig Topper2016-11-221-18/+18
| | | | | | | | | | | | | | | | | VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621
* revert r279960. Igor Breger2016-09-041-8/+24
| | | | | | https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625
* [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + ↵Igor Breger2016-08-291-24/+8
| | | | | | | | TEST sequence. Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960
* Revert r274613 because it breaks the test suite with AVX512Michael Kuperstein2016-08-251-0/+243
| | | | | | | | | | | | This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. (Recommit of r279782 which was broken due to a bad merge.) This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279788
* Revert r279782 due to debug buildbot breakage.Michael Kuperstein2016-08-251-243/+0
| | | | llvm-svn: 279785
* Revert r274613 because it breaks the test suite with AVX512Michael Kuperstein2016-08-251-0/+243
| | | | | | | | | | This reverts most of r274613 and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279782
* [AVX512] Fix insertelement i1 lowering.Igor Breger2016-08-141-12/+98
| | | | | | | | | 1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 ) 2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE. Differential Revision: http://reviews.llvm.org/D23347 llvm-svn: 278623
* [AVX512] Fix extractelement i1 lowering.Igor Breger2016-08-111-0/+110
| | | | | | | | | The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing. Make extractelement i1 lowering custom for all vector i1. Differential Revision: http://reviews.llvm.org/D23246 llvm-svn: 278328
* AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 ↵Elena Demikhovsky2016-08-071-6/+0
| | | | | | | | | | | integer values Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV. It allows to suppress two opposite BITCAST operations and avoid redundant "movs". Differential Revision: https://reviews.llvm.org/D23247 llvm-svn: 277958
* [AVX512] Add initial support for the Execution Domain fixing pass to change ↵Craig Topper2016-07-221-2/+2
| | | | | | some EVEX instructions. llvm-svn: 276393
* VirtRegMap: Replace some identity copies with KILL instructions.Matthias Braun2016-07-091-91/+407
| | | | | | | | | | | | | | An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952
* Re-commit of 274613.Elena Demikhovsky2016-07-061-1/+0
| | | | | | | The prev commit failed on compilation. A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure. llvm-svn: 274626
* Reverted 274613 due to compilation failue. Elena Demikhovsky2016-07-061-0/+1
| | | | llvm-svn: 274615
* AVX-512: Optimization for patterns with i1 scalar typeElena Demikhovsky2016-07-061-1/+0
| | | | | | | | | | | | | | The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc". I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction. I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions. This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173. Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails). Differential revision: http://reviews.llvm.org/D21956 llvm-svn: 274613
* [AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks ↵Craig Topper2016-06-141-3/+3
| | | | | | when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR. llvm-svn: 272626
* [AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable ↵Craig Topper2016-05-211-10/+10
| | | | | | AVX2 versions of vector extract when AVX512VL is enabled. llvm-svn: 270318
* AVX-512: Load and Extended Load for i1 vectorsElena Demikhovsky2016-04-031-2/+2
| | | | | | | | | | Implemented load+{sign|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259
* Optimized loading (zextload) of i1 value from memory.Elena Demikhovsky2016-02-251-2/+0
| | | | | | | | | | | This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793. Extra "and" causes performance degradation. We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits. Differential Revision: http://reviews.llvm.org/D17541 llvm-svn: 261828
* AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.Igor Breger2015-10-081-0/+374
| | | | | | | | | This instructions doesn't have intrincis. Added tests for lowering and encoding. Differential Revision: http://reviews.llvm.org/D12317 llvm-svn: 249688
OpenPOWER on IntegriCloud