summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAG] x | x --> xSanjay Patel2016-10-301-2/+0
| | | | llvm-svn: 285522
* [DAG] x & x --> xSanjay Patel2016-10-301-2/+0
| | | | llvm-svn: 285521
* [x86] add tests for basic logic op foldsSanjay Patel2016-10-302-0/+37
| | | | llvm-svn: 285520
* [ValueTracking] recognize more variants of smin/smaxSanjay Patel2016-10-291-14/+5
| | | | | | | | | | | | | Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202. There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding smax pattern and commuted variants. The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR. Differential Revision: https://reviews.llvm.org/D26091 llvm-svn: 285499
* [x86] add tests for smin/smax matchSelPattern (D26091)Sanjay Patel2016-10-292-59/+127
| | | | llvm-svn: 285498
* [DAGCombiner] (REAPPLIED) Add vector demanded elements support to ↵Simon Pilgrim2016-10-292-43/+13
| | | | | | | | | | | | | | | | | | | | computeKnownBits Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used. I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course. DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit. This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander! Differential Revision: https://reviews.llvm.org/D25691 llvm-svn: 285494
* Fixed FMA + FNEG combine.Elena Demikhovsky2016-10-291-0/+106
| | | | | | | | Masked form of FMA should be omitted in this optimization. Differential Revision: https://reviews.llvm.org/D25984 llvm-svn: 285492
* [DAGCombiner] Fix a crash visiting `AND` nodes.Davide Italiano2016-10-281-0/+20
| | | | | | | | | | Instead of asserting that the shift count is != 0 we just bail out as it's not profitable trying to optimize a node which will be removed anyway. Differential Revision: https://reviews.llvm.org/D26098 llvm-svn: 285480
* [x86] add tests for missed umin/umaxSanjay Patel2016-10-281-0/+59
| | | | | | | This is actually a deficiency in ValueTracking's matchSelectPattern(), but a codegen test is the simplest way to expose the bug. llvm-svn: 285429
* More swift calling convention testsArnold Schwaighofer2016-10-282-0/+318
| | | | llvm-svn: 285417
* Revert "[DAGCombiner] Add vector demanded elements support to computeKnownBits"Juergen Ributzka2016-10-282-13/+43
| | | | | | | This seems to have increased LTO compile time bejond 2x of previous builds. See http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/10676/ llvm-svn: 285381
* [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64Simon Pilgrim2016-10-271-18/+8
| | | | | | | | | | With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304
* [DAGCombiner] Add vector demanded elements support to computeKnownBitsSimon Pilgrim2016-10-272-43/+13
| | | | | | | | | | | | | | | | Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used. I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course. DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit. Differential Revision: https://reviews.llvm.org/D25691 llvm-svn: 285296
* [X86] AVX512 fallback for floating-point scalar selectsZvi Rackover2016-10-261-12/+8
| | | | | | | | | | | | | Summary: In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches. Fixes pr30561 Reviewers: igorb, aymanmus, delena Differential Revision: https://reviews.llvm.org/D25310 llvm-svn: 285196
* [AVX-512] Add scalar vfmsub/vfnmsub mask3 intrinsicsCraig Topper2016-10-261-0/+112
| | | | | | | | | | | | Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions. Reviewers: igorb, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25933 llvm-svn: 285173
* [DAGCombiner] Enable (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), ↵Simon Pilgrim2016-10-251-45/+8
| | | | | | -1)) combine for splatted vectors llvm-svn: 285129
* [X86][SSE] Regenerated known-bits test with srem->urem fixSimon Pilgrim2016-10-251-18/+18
| | | | llvm-svn: 285124
* [DAGCombiner] Enable srem(x.y) -> urem(x,y) combine for vectorsSimon Pilgrim2016-10-251-45/+9
| | | | | | SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927 llvm-svn: 285123
* [X86][SSE] Added vector srem combine testsSimon Pilgrim2016-10-251-0/+123
| | | | llvm-svn: 285121
* [X86][SSE] Added vector urem combine testsSimon Pilgrim2016-10-251-0/+208
| | | | llvm-svn: 285119
* [DAGCombiner] Enable sdiv(x.y) -> udiv(x,y) combine for vectorsSimon Pilgrim2016-10-251-36/+12
| | | | | | SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927 llvm-svn: 285118
* [X86][SSE] Added vector sdiv combine testsSimon Pilgrim2016-10-251-0/+212
| | | | llvm-svn: 285112
* [X86][SSE] Add support for (V)PMOVSX* constant foldingSimon Pilgrim2016-10-258-36/+27
| | | | | | | | | | We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches. This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent. Differential Revision: https://reviews.llvm.org/D25874 llvm-svn: 285072
* [DAGCombine] Preserve shuffles when one of the vector operands is constantZvi Rackover2016-10-251-54/+26
| | | | | | | | | | | | | | | | | | | | Summary: Do *not* perform combines such as: vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X)) -> build_vector(X, C0, C1, C2) Keeping the shuffle allows lowering the constant build_vector to a materialized constant vector (such as a vector-load from the constant-pool or some other idiom). Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25524 llvm-svn: 285063
* [AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ↵Craig Topper2016-10-253-27/+8
| | | | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq. Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added. Reviewers: delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25594 llvm-svn: 285053
* [SelectionDAG] Update ComputeNumSignBits SRA/SHL handlers to accept scalar ↵Simon Pilgrim2016-10-241-8/+2
| | | | | | | | | | or vector splats Use isConstOrConstSplat helper. Also use APInt instead of getZExtValue directly to avoid out of range issues. llvm-svn: 285033
* [x86] add tests for {-1,0,1} select of constantsSanjay Patel2016-10-241-0/+93
| | | | llvm-svn: 285005
* [llvm] Remove redundant --check-prefix=CHECK from testsMandeep Singh Grang2016-10-245-5/+5
| | | | | | | | Reviewers: MatzeB, mcrosier, rengolin Differential Revision: https://reviews.llvm.org/D25894 llvm-svn: 285003
* [x86] regenerate checksSanjay Patel2016-10-241-10/+11
| | | | llvm-svn: 284982
* [AVX-512] Remove masked pmin/pmax intrinsics and autoupgrade to native IR.Craig Topper2016-10-248-871/+836
| | | | | | Clang patch to replace 512-bit vector and 64-bit element versions with native IR will follow. llvm-svn: 284955
* [DAG] enhance computeKnownBits to handle SRL/SRA with vector splat constantSanjay Patel2016-10-232-21/+6
| | | | llvm-svn: 284953
* [X86][AVX512VL] Added support for combining target 256-bit shuffles to ↵Simon Pilgrim2016-10-222-55/+92
| | | | | | AVX512VL VPERMV3 llvm-svn: 284922
* [X86][AVX512] Added support for combining target shuffles to AVX512 VPERMV3Simon Pilgrim2016-10-222-0/+72
| | | | llvm-svn: 284921
* [X86] Apply the Update LLC Test Checks tool on the mmx-bitcast testZvi Rackover2016-10-221-10/+15
| | | | llvm-svn: 284916
* [X86] Add support for printing shuffle comments for VALIGN instructions.Craig Topper2016-10-222-3/+12
| | | | llvm-svn: 284915
* [x86] add test for missing vector SRA combine via computeKnownBitsSanjay Patel2016-10-211-0/+18
| | | | llvm-svn: 284896
* [DAG] enhance computeKnownBits to handle SHL with vector splat constantSanjay Patel2016-10-212-10/+3
| | | | | | Also, use APInt to avoid crashing on types larger than vNi64. llvm-svn: 284874
* X86: Improve BT instruction selection for 64-bit values.Peter Collingbourne2016-10-211-0/+12
| | | | | | | | | | If a 64-bit value is tested against a bit which is known to be in the range [0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly shorter encoding. Differential Revision: https://reviews.llvm.org/D25862 llvm-svn: 284864
* [X86][AVX512BWVL] Added support for lowering v16i16 shuffles to AVX512BWVL ↵Simon Pilgrim2016-10-211-286/+318
| | | | | | vpermw llvm-svn: 284863
* [X86][AVX512BWVL] Added support for combining target v16i16 shuffles to ↵Simon Pilgrim2016-10-212-7/+63
| | | | | | AVX512BWVL vpermw llvm-svn: 284860
* [X86][AVX512] Added support for combining target shuffles to AVX512 ↵Simon Pilgrim2016-10-211-7/+92
| | | | | | vpermpd/vpermq/vpermps/vpermd/vpermw llvm-svn: 284858
* [DAG] fold negation of sign-bitSanjay Patel2016-10-211-13/+4
| | | | | | | | | | | | | | | | 0 - X --> 0, if the sub is NUW 0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW 0 - X --> X, if X is 0 or the minimum signed value This is the DAG equivalent of: https://reviews.llvm.org/rL284649 plus the fold for the NUW case which already existed in InstSimplify. Note that we miss a vector fold because of a deficiency in the DAG version of computeKnownBits(). llvm-svn: 284844
* [x86] add tests for potential negation foldsSanjay Patel2016-10-211-0/+80
| | | | | | | These are the backend equivalents for the tests added in r284627. The patterns may emerge late, so we should have folds for these in the DAG too. llvm-svn: 284842
* [X86][SSE] Regenerated sext/zext constant folding tests and added i686 testsSimon Pilgrim2016-10-211-101/+214
| | | | llvm-svn: 284837
* [X86][SSE] Regenerated chained pmovsx store tests and added i686 testsSimon Pilgrim2016-10-211-76/+401
| | | | llvm-svn: 284833
* [DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero foldsSanjay Patel2016-10-211-25/+4
| | | | | | | | | | | | | | | | | | | | | As discussed in D24815, let's start the process of killing off the broken fast-math global state housed in TargetOptions and eliminate the need for function-level fast-math attributes. Here we enable two similar folds that are possible when we don't care about signed-zero: fadd nsz x, 0 --> x fsub nsz 0, x --> -x Note that although the test cases include a 'sin' function call, I'm side-stepping the FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node. Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't actually do anything today because Flags are silently dropped for any node that is not a binary operator. Differential Revision: https://reviews.llvm.org/D25297 llvm-svn: 284824
* [X86][AVX512] Add mask/maskz writemask support to subvector broadcast ↵Simon Pilgrim2016-10-212-8/+8
| | | | | | shuffle decode comments llvm-svn: 284821
* [X86][AVX] Add 32-bit target tests for vector lzcnt/tzcnt to demonstrate ↵Simon Pilgrim2016-10-212-32/+524
| | | | | | missed folding opportunities llvm-svn: 284816
* [AVX-512] Add tests to show opportunities for commuting vpermi2/vpermt2 ↵Craig Topper2016-10-211-0/+367
| | | | | | | | instructions. Commuting will be added in a future commit. llvm-svn: 284808
* Using branch probability to guide critical edge splitting.Dehao Chen2016-10-206-99/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284757
OpenPOWER on IntegriCloud