summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis
Commit message (Collapse)AuthorAgeFilesLines
* [CostModel][X86] Fix SSE1 FADD/FSUB costsSimon Pilgrim2019-01-041-2/+2
| | | | | | | | Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403
* Revert patches 348835 and 348571 because they'reRanjeet Singh2019-01-041-27/+0
| | | | | | causing code size performance regressions. llvm-svn: 350402
* [CostModel][X86] Add SSE1 fp cost testsSimon Pilgrim2019-01-041-40/+184
| | | | llvm-svn: 350401
* [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffsetFlorian Hahn2019-01-041-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GetPointerBaseWithConstantOffset include this code, where ByteOffset and GEPOffset are both of type llvm::APInt : ByteOffset += GEPOffset.getSExtValue(); The problem with this line is that getSExtValue() returns an int64_t, but the += matches an overload for uint64_t. The problem is that the resulting APInt is no longer considered to be signed. That in turn causes assertion failures later on if the relevant pointer type is > 64 bits in width and the GEPOffset was negative. Changing it to ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth()); resolves the issue and explicitly performs the sign-extending or truncation. Additionally, instead of asserting later if the result is > 64 bits, it breaks out of the loop in that case. See also https://reviews.llvm.org/D24729 https://reviews.llvm.org/D24772 This commit must be merged after D38662 in order for the test to pass. Patch by Michael Ferguson <mpfergu@gmail.com>. Reviewers: reames, sanjoy, hfinkel Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D38501 llvm-svn: 350395
* [CostModel][X86] Add truncate cost tests to cover all legal destination typesSimon Pilgrim2019-01-031-9/+137
| | | | | | We were only testing costs for legal source vector element counts llvm-svn: 350323
* [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)Simon Pilgrim2019-01-032-144/+524
| | | | | | Costs for real SSE2 instructions llvm-svn: 350295
* [X86] Add ADD/SUB SSAT/USAT cost tests (PR40123)Simon Pilgrim2019-01-032-0/+510
| | | | llvm-svn: 350293
* [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)Hal Finkel2019-01-023-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1*V+C2, and we then multiply this by Scale, and distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220
* [ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFCNikita Popov2018-12-201-0/+187
| | | | | | | Move constant folding tests into ConstantFolding/bitcount.ll and drop various tests in other places. Add coverage for undefs. llvm-svn: 349806
* [ConstantFolding] Add tests for funnel shifts with undef operands; NFCNikita Popov2018-12-201-0/+167
| | | | llvm-svn: 349803
* [ConstantFolding] Add tests for sat add/sub with undefs; NFCNikita Popov2018-12-201-0/+218
| | | | llvm-svn: 349802
* [ConstantFolding] Split up saturating add/sub tests; NFCNikita Popov2018-12-201-97/+158
| | | | | | Split each test into a separate function. llvm-svn: 349801
* Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.Michael Kruse2018-12-202-0/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725
* [CostModel][X86] Don't count 2 shuffles on the last level of a pairwise ↵Craig Topper2018-12-131-51/+31
| | | | | | | | | | | | arithmetic or min/max reduction This is split from D55452 with the correct patch this time. Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle. Differential Revision: https://reviews.llvm.org/D55615 llvm-svn: 349072
* Cleanup test case by removing unused attribute dso_localRanjeet Singh2018-12-111-4/+4
| | | | | | | | | Attribute 'dso_local' generated in bitcode from compiling original C file but isn't needed. Differential Revision: https://reviews.llvm.org/D55521 llvm-svn: 348835
* [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max ↵Craig Topper2018-12-109-668/+668
| | | | | | | | | | | | | | | | reduction. Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register? Reviewers: RKSimon, spatel, ABataev Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55480 llvm-svn: 348739
* [CostModel][X86] Fix overcounting arithmetic cost in illegal types in ↵Craig Topper2018-12-0719-1021/+965
| | | | | | | | | | | | | | getArithmeticReductionCost/getMinMaxReductionCost We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width. So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops. There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles. Differential Revision: https://reviews.llvm.org/D55397 llvm-svn: 348621
* Reapply "[DemandedBits][BDCE] Support vectors of integers"Nikita Popov2018-12-071-0/+136
| | | | | | | | | | | | | | | | | | | | | | DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348602
* [IR] Don't assume all functions are 4 byte alignedRanjeet Singh2018-12-071-0/+27
| | | | | | | | | | | | | | | | | In some cases different alignments for function might be used to save space e.g. thumb mode with -Oz will try to use 2 byte function alignment. Similar patch that fixed this in other areas exists here https://reviews.llvm.org/D46110 This was approved previously https://reviews.llvm.org/D55115 (r348215) but when committed it caused failures on the sanitizer buildbots when building llvm with clang (containing this patch). This is now fixed because I've added a check to see if getting the parent module returns null if it does then set the alignment to 0. Differential Revision: https://reviews.llvm.org/D55115 llvm-svn: 348571
* Revert "[DemandedBits][BDCE] Support vectors of integers"Nikita Popov2018-12-071-136/+0
| | | | | | | This reverts commit r348549. Causing assertion failures during clang build. llvm-svn: 348558
* [DemandedBits][BDCE] Support vectors of integersNikita Popov2018-12-061-0/+136
| | | | | | | | | | | | | | | | | | | DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. The getDemandedBits() method can now only be called on instructions that have integer or vector of integer type. Previously it could be called on any sized instruction (even if it was not particularly useful). The size of the return value is now always the scalar size in bits (while previously it was the type size in bits). Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348549
* [X86] Remove -costmodel-reduxcost=true from the experimental vector ↵Craig Topper2018-12-0518-144/+144
| | | | | | | | reduction intrinsic tests as it appears to be unnecessary. NFC I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here. llvm-svn: 348340
* [X86] Add more cost model tests for vector reductions with narrow vector ↵Craig Topper2018-12-0518-0/+531
| | | | | | types. NFC llvm-svn: 348339
* Reverting r348215Ranjeet Singh2018-12-041-27/+0
| | | | | | Causing failures on ubsan buildbot boxes. llvm-svn: 348230
* [IR] Don't assume all functions are 4 byte alignedRanjeet Singh2018-12-041-0/+27
| | | | | | | | | | | In some cases different alignments for function might be used to save space e.g. thumb mode with -Oz will try to use 2 byte function alignment. Similar patch that fixed this in other areas exists here https://reviews.llvm.org/D46110 Differential Revision: https://reviews.llvm.org/D55115 llvm-svn: 348215
* [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.Jonas Paulsson2018-12-031-0/+25
| | | | | | | | | | | | | A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand https://reviews.llvm.org/D55111 llvm-svn: 348141
* [test] Fix ScalarEvolution test to allow __func__ with prototypeMichal Gorny2018-12-021-123/+123
| | | | | | | | | | | | | | | | | | | | Fix ScalarEvolution/solve-quadratic.ll test to account for __func__ output listing the complete function prototype rather than just its name, as it does on NetBSD. Example Linux output: GetQuadraticEquation: addrec coeff bw: 4 GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Example NetBSD output: llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): addrec coeff bw: 4 llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Differential Revision: https://reviews.llvm.org/D55162 llvm-svn: 348096
* [TTI] Reduction costs only need to include a single extract element cost ↵Simon Pilgrim2018-12-0120-2050/+2050
| | | | | | | | | | | | | | | | (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076
* [X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in ↵Craig Topper2018-12-016-6/+6
| | | | | | interleave/strided load/store cost model tests. llvm-svn: 348056
* [DA] GPUDivergenceAnalysis for unstructured GPU kernelsNicolai Haehnle2018-11-3019-0/+1215
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is patch #3 of the new DivergenceAnalysis <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <https://bugs.llvm.org/show_bug.cgi?id=37185> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D53493 llvm-svn: 348048
* [SystemZ::TTI] i8/i16 operands extension costs revisitedJonas Paulsson2018-11-304-44/+89
| | | | | | | | | | | | | | | | | | Three minor changes to these extra costs: * For ICmp instructions, instead of adding 2 all the time for extending each operand, this is only done if that operand is neither a load or an immediate. * The operands extension costs for divides removed, because we now use a high cost already for the divide (20). * The costs for lhsr/ashr extra costs removed as this did not seem useful. Review: Ulrich Weigand https://reviews.llvm.org/D55053 llvm-svn: 347961
* [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where ↵Craig Topper2018-11-281-32/+15
| | | | | | | | | | | | AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786
* [X86] Add some cost model entries for sext/zext for avx512bwCraig Topper2018-11-282-22/+22
| | | | | | | | | | This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785
* [X86] Add a combine for back to back VSRAI instructionsCraig Topper2018-11-281-3/+3
| | | | | | | | Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784
* [SystemZ::TTI] Improve cost for compare of i64 with extended i32 loadJonas Paulsson2018-11-281-1/+17
| | | | | | | | | | | | CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value in memory. This patch makes such a load considered foldable and so gets a 0 cost. Review: Ulrich Weigand https://reviews.llvm.org/D54944 llvm-svn: 347735
* [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.Jonas Paulsson2018-11-281-0/+93
| | | | | | | | | | | | | AH, SH and MH costs are already covered in the cases where LHS is 32 bits and RHS is 16 bits of memory sign-extended to i32. As these instructions are also used when LHS is i16, this patch recognizes that the loads will get folded then as well. Review: Ulrich Weigand https://reviews.llvm.org/D54940 llvm-svn: 347734
* [SystemZ::TTI] Improved cost values for comparison against memory.Jonas Paulsson2018-11-281-0/+27
| | | | | | | | | | | | | Single instructions exist for i8 and i16 comparisons of memory against a small immediate. This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1. Review: Ulrich Weigand https://reviews.llvm.org/D54897 llvm-svn: 347733
* [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.Jonas Paulsson2018-11-281-0/+67
| | | | | | | | | | Since byte-swapping loads and stores are supported, a 'load -> bswap' or 'bswap -> store' sequence should have the cost of one. Review: Ulrich Weigand https://reviews.llvm.org/D54870 llvm-svn: 347732
* [X86] Add test cases to show that we don't properly take ↵Craig Topper2018-11-281-0/+144
| | | | | | | | -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext. The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost. llvm-svn: 347722
* [X86] Add exhaustive cost model testing for sext/zext for all vector types ↵Craig Topper2018-11-272-0/+958
| | | | | | | | | | we reasonably support. Add cost model tests for truncating to vXi1. Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost. Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables. llvm-svn: 347719
* [X86] Add cost model tests for experimental.vector.reduce.* with ↵Craig Topper2018-11-279-0/+2852
| | | | | | -x86-experimental-vector-widening-legalization llvm-svn: 347697
* [X86] Add cost model test for masked load an store with ↵Craig Topper2018-11-271-0/+606
| | | | | | -x86-experimental-vector-widening-legalization llvm-svn: 347696
* [X86] Add cost model tests for fp_to_int/int_to_fp with ↵Craig Topper2018-11-275-0/+1765
| | | | | | -x86-experimental-vector-widening-legalization llvm-svn: 347695
* [X86] Add cost model tests for shifts with ↵Craig Topper2018-11-273-0/+1589
| | | | | | -x86-experimental-vector-widening-legalization. llvm-svn: 347694
* [stack-safety] Inter-Procedural Analysis implementationVitaly Buka2018-11-267-7/+790
| | | | | | | | | | | | | | | | Summary: IPA is implemented as module pass which produce map from Function or Alias to StackSafetyInfo for a single function. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D54543 llvm-svn: 347611
* [stack-safety] Empty local passes for Stack Safety Global AnalysisVitaly Buka2018-11-261-0/+4
| | | | | | | | | | Reviewers: eugenis, vlad.tsyrklevich Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54541 llvm-svn: 347610
* [stack-safety] Local analysis implementationVitaly Buka2018-11-262-3/+538
| | | | | | | | | | | | | | | | Summary: Analysis produces StackSafetyInfo which contains information with how allocas and parameters were used in functions. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54504 llvm-svn: 347603
* [stack-safety] Empty local passes for Stack Safety Local AnalysisVitaly Buka2018-11-261-0/+13
| | | | | | | | | | Reviewers: eugenis, vlad.tsyrklevich Subscribers: mgorny, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54502 llvm-svn: 347602
* [DemandedBits] Add support for funnel shiftsNikita Popov2018-11-261-0/+79
| | | | | | | | | | | | | | Add support for funnel shifts to the DemandedBits analysis. The demanded bits of the first two operands can be determined if the shift amount is constant. The demanded bits of the third operand (shift amount) can be determined if the bitwidth is a power of two. This is basically the same functionality as implemented in D54869 and D54478, but for DemandedBits rather than InstCombine. Differential Revision: https://reviews.llvm.org/D54876 llvm-svn: 347561
* Revert "[TTI] Reduction costs only need to include a single extract element ↵Fedor Sergeev2018-11-2611-1049/+1049
| | | | | | | | | | cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541
OpenPOWER on IntegriCloud