summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably ↵Benjamin Kramer2014-04-261-0/+19
| | | | | | | | cheap now. Turn vectorization back on. llvm-svn: 207320
* [C++] Use 'nullptr'. Target edition.Craig Topper2014-04-251-1/+1
| | | | llvm-svn: 207197
* [Modules] Fix potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-1/+2
| | | | | | | definition below all of the header #include lines, lib/Target/... edition. llvm-svn: 206842
* Add comments and test case for [X86TTI] Make constant base pointers for ↵Juergen Ributzka2014-04-021-0/+3
| | | | | | GetElementPtr opaque (r204739). llvm-svn: 205468
* Implement X86TTI::getUnrollingPreferencesHal Finkel2014-04-011-0/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | This provides an initial implementation of getUnrollingPreferences for x86. getUnrollingPreferences is used by the generic (concatenation) unroller, which is distinct from the unrolling done by the loop vectorizer. Many modern x86 cores have some kind of uop cache and loop-stream detector (LSD) used to efficiently dispatch small loops, and taking full advantage of this requires unrolling small loops (small here means 10s of uops). These caches also have limits on the number of taken branches in the loop, and so we also cap the loop unrolling factor based on the maximum "depth" of the loop. This is currently calculated with a partial DFS traversal (partial because it will stop early if the path length grows too much). This is still an approximation, and one that is both conservative (because it does not account for branches eliminated via block placement) and optimistic (because it is only recording the maximum depth over minimum paths). Nevertheless, because the loops that fit in these uop caches are so small, it is not clear how much the details matter. The original set of patches posted for review produced the following test-suite performance results (from the TSVC benchmark) at that time: ControlLoops-dbl - 13% speedup ControlLoops-flt - 15% speedup Reductions-dbl - 7.5% speedup llvm-svn: 205348
* [X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as wellAdam Nemet2014-03-311-0/+1
| | | | | | | | | Pretty obvious follow-on to r205159 to also handle conversion from double besides float. Fixes <rdar://problem/16373208> llvm-svn: 205253
* [X86] Adjust cost of FP_TO_UINT v8f32->v8i32Adam Nemet2014-03-301-0/+6
| | | | | | | | | | | | | | | There is no direct AVX instruction to convert to unsigned. I have some ideas how we may be able to do this with three vector instructions but the current backend just bails on this to get it scalarized. See the comment why we need to adjust the cost returned by BasicTTI. The test is a bit roundabout (and checks assembly rather than bit code) because I'd like it to work even if at some point we could vectorize this conversion. Fixes <rdar://problem/16371920> llvm-svn: 205159
* [X86][Vector Cost Model] Add a comment to explain the workaroundQuentin Colombet2014-03-271-0/+5
| | | | | | | | in my previous commit (r204884). <rdar://problem/16381225> llvm-svn: 204972
* [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64Quentin Colombet2014-03-271-0/+2
| | | | | | | | | | and v4i64->v4f64. The new costs match what we did for SSE2 and reflect the reality of our codegen. <rdar://problem/16381225> llvm-svn: 204884
* X86: Correct vectorization cost model for v8f32->v8i8.Jim Grosbach2014-03-271-1/+1
| | | | | | | | Fix the cost model to reflect the reality of our codegen. rdar://16370633 llvm-svn: 204880
* [X86TTI] Make constant base pointers for getElementPtr opaque.Juergen Ributzka2014-03-251-2/+3
| | | | | | | | | If getElementPtr uses a constant as base pointer, then make the constant opaque. This prevents constant folding it with the offset. The offset can usually be encoded in the load/store instruction itself and the base address doesn't have to be rematerialized several times. llvm-svn: 204739
* [Stackmaps][X86TTI] Fix think-o in getIntImmCost calculation.Juergen Ributzka2014-03-251-7/+6
| | | | | | | | The cost for the first four stackmap operands was always TCC_Free. This is only true for the first two operands. All other operands are TCC_Free if they are within 64bit. llvm-svn: 204738
* [Constant Hoisting] Make the constant materialization cost operand dependentJuergen Ributzka2014-03-211-15/+32
| | | | | | | | | Extend the target hook to take also the operand index into account when calculating the cost of the constant materialization. Related to <rdar://problem/16381500> llvm-svn: 204435
* Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass."Juergen Ributzka2014-03-201-32/+15
| | | | | | I will break this up into smaller pieces for review and recommit. llvm-svn: 204393
* [Constant Hoisting] Extend coverage of the constant hoisting pass.Juergen Ributzka2014-03-201-15/+32
| | | | | | | | | This commit extends the coverage of the constant hoisting pass, adds additonal debug output and updates the function names according to the style guide. Related to <rdar://problem/16381500> llvm-svn: 204389
* [C++11] Remove 'virtual' keyword from methods marked with 'override' keyword.Craig Topper2014-03-101-34/+32
| | | | llvm-svn: 203444
* [TTI] There is actually no realistic way to pop TTI implementations offChandler Carruth2014-03-101-4/+0
| | | | | | | | | | | | | | the stack of the analysis group because they are all immutable passes. This is made clear by Craig's recent work to use override systematically -- we weren't overriding anything for 'finalizePass' because there is no such thing. This is kind of a lame restriction on the API -- we can no longer push and pop things, we just set up the stack and run. However, I'm not invested in building some better solution on top of the existing (terrifying) immutable pass and legacy pass manager. llvm-svn: 203437
* Switch all uses of LLVM_OVERRIDE to just use 'override' directly.Craig Topper2014-03-021-21/+19
| | | | llvm-svn: 202621
* Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.Craig Topper2014-03-021-1/+1
| | | | llvm-svn: 202618
* [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo calledAndrea Di Biagio2014-02-121-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | 'OK_NonUniformConstValue' to identify operands which are constants but not constant splats. The cost model now allows returning 'OK_NonUniformConstValue' for non splat operands that are instances of ConstantVector or ConstantDataVector. With this change, targets are now able to compute different costs for instructions with non-uniform constant operands. For example, On X86 the cost of a vector shift may vary depending on whether the second operand is a uniform or non-uniform constant. This patch applies the following changes: - The cost model computation now takes into account non-uniform constants; - The cost of vector shift instructions has been improved in X86TargetTransformInfo analysis pass; - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish between non-uniform and uniform constant operands. Added a new test to verify that the output of opt '-cost-model -analyze' is valid in the following configurations: SSE2, SSE4.1, AVX, AVX2. llvm-svn: 201272
* X86: add costs for 64-bit vector ext/trunc & rebalanceTim Northover2014-02-061-15/+58
| | | | | | | | | | | | | | | | | | | | The most important part of this is probably adding any cost at all for operations like zext <8 x i8> to <8 x i32>. Before they were being recorded as extremely costly (24, I believe) which made LLVM fall back on a 4-wide vectorisation of a loop. It also rebalances the values for sext, zext and trunc. Lacking any other sane metric that might work across CPU microarchitectures I went for instructions. This seems to be in reasonable accord with the rest of the table (sitofp, ...) though no doubt at least one value is sub-optimal for some bizarre reason. Finally, separate AVX and AVX2 values are provided where appropriate. The CodeGen is quite different in many cases. rdar://problem/15981990 llvm-svn: 200928
* Revert "Revert "Add Constant Hoisting Pass" (r200034)"Juergen Ributzka2014-01-251-0/+95
| | | | | | | This reverts commit r200058 and adds the using directive for ARMTargetTransformInfo to silence two g++ overload warnings. llvm-svn: 200062
* Revert "Add Constant Hoisting Pass" (r200034)Hans Wennborg2014-01-251-95/+0
| | | | | | | | | | | | | | | This commit caused -Woverloaded-virtual warnings. The two new TargetTransformInfo::getIntImmCost functions were only added to the superclass, and to the X86 subclass. The other targets were not updated, and the warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was hiding the two new getIntImmCost variants. We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost" to the various subclasses, or turning it off, but I suspect that it's wrong to leave the functions unimplemnted in those targets. The default implementations return TCC_Free, which I don't think is right e.g. for ARM. llvm-svn: 200058
* Add Constant Hoisting PassJuergen Ributzka2014-01-241-0/+95
| | | | | | | | Retry commit r200022 with a fix for the build bot errors. Constant expressions have (unlike instructions) module scope use lists and therefore may have users in different functions. The fix is to simply ignore these out-of-function uses. llvm-svn: 200034
* Revert "Add Constant Hoisting Pass"Juergen Ributzka2014-01-241-95/+0
| | | | | | This reverts commit r200022 to unbreak the build bots. llvm-svn: 200024
* Add Constant Hoisting PassJuergen Ributzka2014-01-241-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass identifies expensive constants to hoist and coalesces them to better prepare it for SelectionDAG-based code generation. This works around the limitations of the basic-block-at-a-time approach. First it scans all instructions for integer constants and calculates its cost. If the constant can be folded into the instruction (the cost is TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't consider it expensive and leave it alone. This is the default behavior and the default implementation of getIntImmCost will always return TCC_Free. If the cost is more than TCC_BASIC, then the integer constant can't be folded into the instruction and it might be beneficial to hoist the constant. Similar constants are coalesced to reduce register pressure and materialization code. When a constant is hoisted, it is also hidden behind a bitcast to force it to be live-out of the basic block. Otherwise the constant would be just duplicated and each basic block would have its own copy in the SelectionDAG. The SelectionDAG recognizes such constants as opaque and doesn't perform certain transformations on them, which would create a new expensive constant. This optimization is only applied to integer constants in instructions and simple (this means not nested) constant cast experessions. For example: %0 = load i64* inttoptr (i64 big_constant to i64*) Reviewed by Eric llvm-svn: 200022
* Add final and owerride keywords to TargetTransformInfo's subclasses.Juergen Ributzka2014-01-241-18/+20
| | | | llvm-svn: 200021
* Re-sort all of the includes with ./utils/sort_includes.py so thatChandler Carruth2014-01-071-1/+1
| | | | | | | | | | subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685
* Correct word hyphenationsAlp Toker2013-12-051-2/+2
| | | | | | | This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
* X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.Benjamin Kramer2013-10-231-0/+3
| | | | | | Also update the cost model. llvm-svn: 193270
* X86 horizontal vector reduction cost modelYi Jiang2013-09-191-0/+84
| | | | llvm-svn: 191021
* Using popcount should check the popcount feature flag not the SSE41 feature ↵Craig Topper2013-09-081-2/+2
| | | | | | flag. llvm-svn: 190258
* Add a overload to CostTable which allows it to infer the size of the table.Benjamin Kramer2013-08-091-33/+25
| | | | | | | | Use it to avoid repeating ourselves too often. Also store MVT::SimpleValueType in the TTI tables so they can be statically initialized, MVT's constructors create bloated initialization code otherwise. llvm-svn: 188095
* X86 cost model: Add cost for vectorized gather/scatherArnold Schwaighofer2013-07-121-0/+15
| | | | | | radar://14351991 llvm-svn: 186189
* Get rid of the unused class member.Nadav Rotem2013-06-271-3/+2
| | | | llvm-svn: 185086
* CostModel: improve the cost model for load/store of non power-of-two types ↵Nadav Rotem2013-06-271-0/+43
| | | | | | such as <3 x float>, which are popular in graphics. llvm-svn: 185085
* X86 cost model: Vectorizing integer division is a bad ideaArnold Schwaighofer2013-06-251-0/+25
| | | | | | radar://14057959 llvm-svn: 184872
* Fix 80 col violation.Nadav Rotem2013-06-181-3/+6
| | | | llvm-svn: 184228
* X86 cost model: Exit before calling getSimpleVT on non-simple VTsArnold Schwaighofer2013-04-171-0/+4
| | | | | | | | getSimpleVT can only handle simple value types. radar://13676022 llvm-svn: 179714
* X86 cost model: Model cost for uitofp and sitofp on SSE2Arnold Schwaighofer2013-04-081-3/+34
| | | | | | | | | | | | | | | | | | | | The costs are overfitted so that I can still use the legalization factor. For example the following kernel has about half the throughput vectorized than unvectorized when compiled with SSE2. Before this patch we would vectorize it. unsigned short A[1024]; double B[1024]; void f() { int i; for (i = 0; i < 1024; ++i) { B[i] = (double) A[i]; } } radar://13599001 llvm-svn: 179033
* X86 cost model: Differentiate cost for vector shifts of constantsArnold Schwaighofer2013-04-041-0/+29
| | | | | | | | | | | | SSE2 has efficient support for shifts by a scalar. My previous change of making shifts expensive did not take this into account marking all shifts as expensive. This would prevent vectorization from happening where it is actually beneficial. With this change we differentiate between shifts of constants and other shifts. radar://13576547 llvm-svn: 178808
* CostModel: Add parameter to instruction cost to further classify operand valuesArnold Schwaighofer2013-04-041-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On certain architectures we can support efficient vectorized version of instructions if the operand value is uniform (splat) or a constant scalar. An example of this is a vector shift on x86. We can efficiently support for (i = 0 ; i < ; i += 4) w[0:3] = v[0:3] << <2, 2, 2, 2> but not for (i = 0; i < ; i += 4) w[0:3] = v[0:3] << x[0:3] This patch adds a parameter to getArithmeticInstrCost to further qualify operand values as uniform or uniform constant. Targets can then choose to return a different cost for instructions with such operand values. A follow-up commit will test this feature on x86. radar://13576547 llvm-svn: 178807
* X86 cost model: Vector shifts are expensive in most casesArnold Schwaighofer2013-04-031-0/+42
| | | | | | | | | | | | | | The default logic does not correctly identify costs of casts because they are marked as custom on x86. For some cases, where the shift amount is a scalar we would be able to generate better code. Unfortunately, when this is the case the value (the splat) will get hoisted out of the loop, thereby making it invisible to ISel. radar://13130673 radar://13537826 llvm-svn: 178703
* X86TTI: Add accurate costs for itofp operations, based on the actual ↵Benjamin Kramer2013-04-011-4/+27
| | | | | | instruction counts. llvm-svn: 178459
* Correct cost model for vector shift on AVX2Michael Liao2013-03-201-0/+23
| | | | | | | | | - After moving logic recognizing vector shift with scalar amount from DAG combining into DAG lowering, we declare to customize all vector shifts even vector shift on AVX is legal. As a result, the cost model needs special tuning to identify these legal cases. llvm-svn: 177586
* Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.Nadav Rotem2013-03-191-2/+2
| | | | | | Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com> llvm-svn: 177421
* X86 cost model: Adjust cost for custom lowered vector multipliesArnold Schwaighofer2013-03-021-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This matters for example in following matrix multiply: int **mmult(int rows, int cols, int **m1, int **m2, int **m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403
* I optimized the following patterns:Elena Demikhovsky2013-02-201-0/+3
| | | | | | | | | | | | | | | | sext <4 x i1> to <4 x i64> sext <4 x i8> to <4 x i64> sext <4 x i16> to <4 x i64> I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns: (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT))) The sext_in_reg (v4i32 x) may be lowered to shl+sar operations. The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution. I also added a cost of this operations to the AVX costs table. llvm-svn: 175619
* Moving Cost Tables up to share with other targetsRenato Golin2013-01-241-48/+11
| | | | llvm-svn: 173382
* Revert CostTable algorithm, will re-writeRenato Golin2013-01-201-66/+102
| | | | llvm-svn: 172992
OpenPOWER on IntegriCloud