summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/avg.ll
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombiner] reduce extract subvector of concatSanjay Patel2020-01-091-7/+5
| | | | | | | | | | | | | | If we are extracting a chunk of a vector that's a fraction of an operand of the concatenated vector operand, we can extract directly from one of those original operands. This is another suggestion from PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024#c2 But I'm not sure yet if it will make any difference on those patterns. It seems to help a few existing AVX512 tests though. Differential Revision: https://reviews.llvm.org/D72361
* [X86] Pass v32i16/v64i8 in zmm registers on KNL target.Craig Topper2019-08-301-2/+5
| | | | | | | | | | | | | | | gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495
* [X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more ↵Craig Topper2019-08-191-4/+4
| | | | | | | | | | | | | | than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle. The motivating case are the changes in vector-reduce-add.ll where we were doing extra work in the scalar domain instead of shuffling. There may be some one use check that needs to be looked into there, but this patch sidesteps the issue by avoiding broadcasts that aren't really broadcasting. Differential Revision: https://reviews.llvm.org/D66071 llvm-svn: 369287
* Recommit r367901 "[X86] Enable ↵Craig Topper2019-08-071-486/+993
| | | | | | | | | | | | | | | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183
* Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."Mitch Phillips2019-08-061-993/+486
| | | | | | | | | This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107
* [X86] Enable -x86-experimental-vector-widening-legalization by default.Craig Topper2019-08-051-486/+993
| | | | | | | | | | | | | | | | | | | | | This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901
* [X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector ↵Craig Topper2019-07-221-1/+1
| | | | | | | | | | | | | | | | directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it. The build_vector will become a constant pool load. By using the desired type initially, it ensures we don't generate a bitcast of the constant pool load which will need to be folded with the load. While experimenting with another patch, I noticed that when the load type and the constant pool type don't match, then SimplifyDemandedBits can't handle it. While we should probably fix that, this was a simple way to fix the issue I saw. llvm-svn: 366732
* [X86][AVX] combineExtractSubvector - 'little to big' ↵Simon Pilgrim2019-06-261-2/+2
| | | | | | | | extract_subvector(bitcast()) support Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all..... llvm-svn: 364407
* [x86] split 256-bit store of concatenated vectorsSanjay Patel2019-06-041-210/+192
| | | | | | | | | | | | | | | | | | | | This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524
* Revert "[x86] split 256-bit store of concatenated vectors"Sanjay Patel2019-05-281-192/+210
| | | | | | | | | This reverts commit d5a8637072f4c556b88156bd2f6237a2ead47d31. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850
* [x86] split 256-bit store of concatenated vectorsSanjay Patel2019-05-281-210/+192
| | | | | | | | | | | | | | | | | | | | This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822
* [X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773)Simon Pilgrim2019-05-071-56/+52
| | | | | | Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining. llvm-svn: 360134
* [X86] Use INSERT_SUBREG rather than SUBREG_TO_REG when creating LEA64_32 ↵Craig Topper2019-04-041-86/+88
| | | | | | | | | during isel. SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are zero. But that isn't the case here. We've done no analysis of the inputs. llvm-svn: 357673
* [X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316)Simon Pilgrim2019-03-301-59/+7
| | | | | | Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits. llvm-svn: 357351
* [X86][SSE] Add PAVG test case from PR41316Simon Pilgrim2019-03-301-0/+80
| | | | llvm-svn: 357346
* [X86][AVX] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)Simon Pilgrim2019-03-241-54/+56
| | | | | | | | Just enable this for AVX for now as SSE41 introduces extra register moves for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern (but otherwise helps reduce port5 usage on Intel targets). Only AVX support is required for PR40685 as the issue is due to 8i8->8i32 zext shuffle leftovers. llvm-svn: 356858
* [X86] Add SimplifyDemandedBitsForTargetNode support for PINSRB/PINSRWSimon Pilgrim2019-03-151-172/+172
| | | | llvm-svn: 356270
* [x86] narrow a shuffle that doesn't use or set any high elementsSanjay Patel2019-01-251-102/+103
| | | | | | | | | | | | | | | This isn't the final fix for our reduction/horizontal codegen, but it takes care of a lot of the problems. After we narrow the shuffle, existing combines for insert/extract and binops kick in, and we end up with cheaper 128-bit ops. The avg and mul reduction tests show an existing shuffle lowering hole for AVX2/AVX512. I think in its most minimal form this is: https://bugs.llvm.org/show_bug.cgi?id=40434 ...but we might need multiple fixes to get it right. Differential Revision: https://reviews.llvm.org/D57156 llvm-svn: 352209
* [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading ↵Craig Topper2018-11-231-725/+382
| | | | | | | | | | | | towards scalarizing the type. This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together. But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code. The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway. llvm-svn: 347482
* [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to ↵Craig Topper2018-11-221-22/+7
| | | | | | | | | | SplitVecOp_UnaryOp if splitting the output type would be a legal type. SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that. The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked. llvm-svn: 347479
* [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an ↵Craig Topper2018-11-181-28/+28
| | | | | | | | | | | | | | | | extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171
* [x86] allow vector load narrowing with multi-use valuesSanjay Patel2018-11-101-390/+312
| | | | | | | | | | | | | | | | | | | | | | This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
* [X86] Don't emit *_extend_vector_inreg nodes when both the input and output ↵Craig Topper2018-11-021-172/+160
| | | | | | | | | | | | | | | | types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043
* Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction"Craig Topper2018-10-311-223/+232
| | | | | | Google is reporting regressions on some benchmarks. llvm-svn: 345785
* [X86] Bring back the MOV64r0 pseudo instructionCraig Topper2018-10-241-232/+223
| | | | | | | | | | | | This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos. My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at. There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling. Differential Revision: https://reviews.llvm.org/D52757 llvm-svn: 345165
* [X86] Add 128 MOVDDUP to the constant pool printing in ↵Craig Topper2018-10-151-2/+4
| | | | | | | | X86AsmPrinter::EmitInstruction. We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector. llvm-svn: 344486
* [X86][AVX1] Enable *_EXTEND_VECTOR_INREG lowering of 256-bit vectorsSimon Pilgrim2018-10-091-6/+6
| | | | | | | | As discussed on D52964, this adds 256-bit *_EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling. Differential Revision: https://reviews.llvm.org/D52980 llvm-svn: 344019
* [X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectorsSimon Pilgrim2018-10-081-55/+46
| | | | | | | | Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964. Differential Revision: https://reviews.llvm.org/D52970 llvm-svn: 343991
* [X86] Handle COPYs of physregs better (regalloc hints)Simon Pilgrim2018-09-191-5/+5
| | | | | | | | | | | | | | Enable enableMultipleCopyHints() on X86. Original Patch by @jonpa: While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling. Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates. Differential Revision: https://reviews.llvm.org/D38128 llvm-svn: 342578
* [X86] Don't create ZERO_EXTEND_INREG/SIGN_EXTEND_INREG for v1iX vectors.Craig Topper2018-09-071-15/+4
| | | | | | The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that. llvm-svn: 341705
* [X86] Don't create X86ISD::AVG nodes from v1iX vectors.Craig Topper2018-09-071-0/+40
| | | | | | | | The type legalizer will try to scalarize this and fail. It looks like there's some other v1iX oddities out there too since we still generated some vector instructions. llvm-svn: 341704
* [X86][SSE] Consistently prefer lowering to PACKUS over PACKSSSimon Pilgrim2018-06-081-12/+12
| | | | | | | | | | We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS. PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines. The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets. llvm-svn: 334276
* [DAGCombiner] Change the SDLoc on split extloads (2/N)Vedant Kumar2018-05-011-92/+91
| | | | | | | | | | | | | | | | | | | | | In DAGCombiner, we try to simplify this pattern: ([s|z]ext (load ...)) Conceptually, a new extload which is created while splitting the load should have the same debug location as the load. Making this change affects the IROrder of the new load, causing some test case churn. In practice, the new location is never different from the location of the [s|z]ext, at least not during check-llvm or a stage2 build. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46156 llvm-svn: 331301
* [DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)Vedant Kumar2018-05-011-117/+116
| | | | | | | | | | | | | | | | | | | | | Setting the right SDLoc on a newly-created zextload fixes a line table bug which resulted in non-linear stepping behavior. Several backend tests contained CHECK lines which relied on the IROrder inherited from the wrong SDLoc. This patch breaks that dependence where feasbile and regenerates test cases where not. In some cases, changing a node's IROrder may alter register allocation and spill behavior. This can affect performance. I have chosen not to prevent this by applying a "known good" IROrder to SDLocs, as this may hide a more general bug in the scheduler, or cause regressions on other test inputs. rdar://33755881, Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D45995 llvm-svn: 331300
* [test] Update llc checks for CodeGen/X86/avg.llVedant Kumar2018-04-241-170/+170
| | | | | | | | | | | | | | | | The output of update_llc_test_checks.py on this test file has changed, so the test file should be updated to minimize source changes in future patches. The test updates for this file appear to be limited to relaxations of the form: -; SSE2-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill +; SSE2-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill This was suggested in https://reviews.llvm.org/D45995. llvm-svn: 330758
* [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"Nirav Dave2018-03-191-57/+57
| | | | | | | Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898
* Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""Nirav Dave2018-03-171-57/+57
| | | | | | as it times out building test-suite on PPC. llvm-svn: 327778
* [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"Nirav Dave2018-03-171-57/+57
| | | | | | | Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777
* [X86] Post process the DAG after isel to remove vector moves that were added ↵Craig Topper2018-03-161-1/+0
| | | | | | | | | | | | to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724
* [LegalizeTypes] In SplitVecOp_TruncateHelper, use GetSplitVector on the ↵Craig Topper2018-03-131-328/+165
| | | | | | input instead of creating new extract_subvectors. llvm-svn: 327355
* Revert: r327172 "Correct load-op-store cycle detection analysis"Nirav Dave2018-03-101-57/+57
| | | | | | | | | | r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197
* Improve Dependency analysis when doing multi-node Instruction SelectionNirav Dave2018-03-091-57/+57
| | | | | | | | | | | | | | | | | | | | Relanding after fixing NodeId Invariant. Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 327171
* [TargetLowering] Add vector BITCAST support to SimplifyDemandedVectorEltsSimon Pilgrim2018-03-061-147/+137
| | | | | | | | Notably helps cleanup after legalization of vector types Differential Revision: https://reviews.llvm.org/D43674 llvm-svn: 326838
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2018-02-271-1/+1
| | | | | | | | Re-enable commit r323991 now that r325931 has been committed to make MachineOperand::isRenamable() check more conservative w.r.t. code changes and opt-in on a per-target basis. llvm-svn: 326208
* [X86] Don't use getZExtValue when we have no idea how large the input ↵Craig Topper2018-02-261-0/+1049
| | | | | | elements are. llvm-svn: 326066
* [X86] Remove VT.isSimple() check from detectAVGPattern.Craig Topper2018-02-261-0/+372
| | | | | | Which types are considered 'simple' is a function of the requirements of all targets that LLVM supports. That shouldn't directly affect what types we are able to handle. The remainder of this code checks that the number of elements is a power of 2 and takes care of splitting down to a legal size. llvm-svn: 326063
* [DAG, X86] Revert r324797, r324491, and r324359.Chandler Carruth2018-02-171-57/+57
| | | | | | | | | | | | Sadly, r324359 caused at least PR36312. There is a patch out for review but it seems to be taking a bit and we've already had these crashers in tree for too long. We're hitting this PR in real code now and are blocked on shipping new compilers as a consequence so I'm reverting us back to green. Sorry for the churn due to the stacked changes that I had to revert. =/ llvm-svn: 325420
* [DAG, X86] Improve Dependency analysis when doing multi-nodeNirav Dave2018-02-061-57/+57
| | | | | | | | | | | | | | | | | | | | Instruction Selection Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 324359
* [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for ↵Craig Topper2018-01-181-6/+6
| | | | | | | | consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820
* [X86][SSE] Split large PAVGB/PAVGW vectors to legal widthsSimon Pilgrim2017-12-211-2288/+283
| | | | | | | | Patch to allow detectAVGPattern handle vectors larger than the legal size (128 SSE2, 256 AVX2, 512 AVX512BW), splitting the vectors accordingly. Differential Revision: https://reviews.llvm.org/D41440 llvm-svn: 321288
OpenPOWER on IntegriCloud