summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Analysis/TargetTransformInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [IR] move shuffle mask queries from TTI to ShuffleVectorInstSanjay Patel2018-06-191-168/+18
| | | | | | | | | | | | | | | | The optimizer is getting smarter (eg, D47986) about differentiating shuffles based on its mask values, so we should make queries on the mask constant operand generally available to avoid code duplication. We'll probably use this soon in the vectorizers and instcombine (D48023 and https://bugs.llvm.org/show_bug.cgi?id=37806). We might clean up TTI a bit more once all of its current 'SK_*' options are covered. Differential Revision: https://reviews.llvm.org/D48236 llvm-svn: 335067
* Fix namespaces. No functionality change.Benjamin Kramer2018-06-161-1/+1
| | | | llvm-svn: 334890
* [CostModel] Cleanup isSingleSourceVectorMask to match other shuffle ↵Simon Pilgrim2018-06-141-10/+12
| | | | | | matchers. NFCI. llvm-svn: 334699
* [CostModel] Recognise REVERSE shuffle mask if the elements come from the ↵Simon Pilgrim2018-06-141-4/+11
| | | | | | second src llvm-svn: 334698
* [CostModel] Recognise BROADCAST shuffle mask if the elements come from the ↵Simon Pilgrim2018-06-131-4/+11
| | | | | | second src llvm-svn: 334620
* [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select ↵Simon Pilgrim2018-06-121-19/+15
| | | | | | | | | | | | | | | | | | (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513
* Fix signed/unsigned warning. NFCI.Simon Pilgrim2018-06-121-2/+2
| | | | llvm-svn: 334509
* [CostModel] Treat Identity shuffle masks as zero costSimon Pilgrim2018-06-121-0/+20
| | | | | | | | | | As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506
* [TTI] Add uniform/non-uniform constant Pow2 detection to ↵Simon Pilgrim2018-05-221-13/+28
| | | | | | | | | | | | | | | | TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-1/+1
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [TTI, AArch64] Add transpose shuffle kindMatthew Simpson2018-04-261-10/+74
| | | | | | | | | | | | | | This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941
* [LV] Introduce TTI::getMinimumVFKrzysztof Parzyszek2018-04-131-0/+4
| | | | | | | | | | | The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062
* Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header ↵David Blaikie2018-03-281-0/+2
| | | | | | | | dependency Thanks to echristo for the pointers on direction. llvm-svn: 328737
* [LV] Add TTI::shouldMaximizeVectorBandwidth to allow enabling it per targetKrzysztof Parzyszek2018-03-271-0/+4
| | | | | | | | The default implementation returns false and keeps the current behavior. Differential Revision: https://reviews.llvm.org/D44735 llvm-svn: 328632
* [LSR] Allow giving priority to post-incrementing addressing modesKrzysztof Parzyszek2018-03-261-0/+14
| | | | | | | | | | | Implement TTI interface for targets to indicate that the LSR should give priority to post-incrementing addressing modes. Combination of patches by Sebastian Pop and Brendon Cahoon. Differential Revision: https://reviews.llvm.org/D44758 llvm-svn: 328490
* [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused ↵Sanjay Patel2018-02-051-0/+4
| | | | | | | | | | | | | | | (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289
* Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass ↵Zaara Syeda2018-01-301-0/+4
| | | | | | | | | | | | | | | | | | | | | to mark candidates with coldcc attribute. This recommits r322721 reverted due to sanitizer memory leak build bot failures. Original commit message: This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 323778
* Revert [PowerPC] This reverts commit rL322721Zaara Syeda2018-01-171-4/+0
| | | | | | Failing build bots. Revert the commit now. llvm-svn: 322748
* [PowerPC] Add handling for ColdCC calling convention and a pass to markZaara Syeda2018-01-171-0/+4
| | | | | | | | | | | | | | | | candidates with coldcc attribute. This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 322721
* Revert r321377, it causes regression to https://reviews.llvm.org/P8055.Guozhi Wei2017-12-281-4/+0
| | | | llvm-svn: 321528
* [SimplifyCFG] Don't do if-conversion if there is a long dependence chainGuozhi Wei2017-12-221-0/+4
| | | | | | | | | | If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor. This patch checks for the long dependence chain, and give up if-conversion if find one. Differential Revision: https://reviews.llvm.org/D39352 llvm-svn: 321377
* [Memcpy Loop Lowering] Remove the fixed int8 lowering.Sean Fertile2017-12-181-9/+0
| | | | | | | | Switch over to the lowering that uses target supplied operand types. Differential Revision: https://reviews.llvm.org/D41201 llvm-svn: 320989
* [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend ↵Sanjay Patel2017-11-271-0/+4
| | | | | | | | | | | on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094
* [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).Clement Courbet2017-10-301-2/+3
| | | | | | | | | | | | - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905
* [NVPTX] allow address space inference for volatile loads/stores.Artem Belevich2017-10-241-0/+5
| | | | | | | | | | If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495
* Revert r314923: "Recommit : Use the basic cost if a GEP is not used as ↵Daniel Jasper2017-10-131-5/+0
| | | | | | | | | | | | | addressing mode" Significantly reduces performancei (~30%) of gipfeli (https://github.com/google/gipfeli) I have not yet managed to reproduce this regression with the open-source version of the benchmark on github, but will work with others to get a reproducer to you later today. llvm-svn: 315680
* Recommit : Use the basic cost if a GEP is not used as addressing modeJun Bum Lim2017-10-041-0/+5
| | | | | | | | | | | | | | Recommitting r314517 with the fix for handling ConstantExpr. Original commit message: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. llvm-svn: 314923
* Revert "Use the basic cost if a GEP is not used as addressing mode"Alex Shlyapnikov2017-09-291-5/+0
| | | | | | | | | | | | | | | | | | | | | | | This reverts commit r314517. This commit crashes sanitizer bots, for example: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167 Stack snippet: ... /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Value const*>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0 ... llvm-svn: 314560
* Use the basic cost if a GEP is not used as addressing modeJun Bum Lim2017-09-291-0/+5
| | | | | | | | | | | | | | | | | | | Summary: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng Reviewed By: hfinkel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38085 llvm-svn: 314517
* [CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> ↵Clement Courbet2017-09-251-2/+2
| | | | | | | | | | | | | | | | TargetTransformInfo::enableMemCmpExpansion. Summary: Right now there are two functions with the same name, one does the work and the other one returns true if expansion is needed. Rename TargetTransformInfo::expandMemCmp to make it more consistent with other members of TargetTransformInfo. Remove the unused Instruction* parameter. Differential Revision: https://reviews.llvm.org/D38165 llvm-svn: 314096
* [DivRempairs] add a pass to optimize div/rem pairs (PR31028)Sanjay Patel2017-09-091-0/+4
| | | | | | | | | | | | | | | | | | This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862
* [TargetTransformInfo] Add a new public interface getInstructionCostGuozhi Wei2017-09-081-0/+557
| | | | | | | | | | | | | | | | | | | | | | | | | | Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface: enum TargetCostKind { TCK_RecipThroughput, ///< Reciprocal throughput. TCK_Latency, ///< The latency of instruction. TCK_CodeSize ///< Instruction code size. }; int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const; All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model. This patch also provides a simple default implementation of getInstructionLatency. The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways: Add more detail into this function. Add getXXXLatency function and call it from here. Implement target specific getInstructionLatency function. Differential Revision: https://reviews.llvm.org/D37170 llvm-svn: 312832
* [SLP] Support for horizontal min/max reduction.Alexey Bataev2017-09-081-0/+9
| | | | | | | | | | | | | SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791
* Model cache size and associativity in TargetTransformInfoTobias Grosser2017-08-241-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We add the precise cache sizes and associativity for the following Intel architectures: - Penry - Nehalem - Westmere - Sandy Bridge - Ivy Bridge - Haswell - Broadwell - Skylake - Kabylake Polly uses since several months a performance model for BLAS computations that derives optimal cache and register tile sizes from cache and latency information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016). While bootstrapping this model, these target values have been kept in Polly. However, as our implementation is now rather mature, it seems time to teach LLVM itself about cache sizes. Interestingly, L1 and L2 cache sizes are pretty constant across micro-architectures, hence a set of architecture specific default values seems like a good start. They can be expanded to more target specific values, in case certain newer architectures require different values. For now a set of Intel architectures are provided. Just as a little teaser, for a simple gemm kernel this model allows us to improve performance from 1.2s to 0.27s. For gemm kernels with less optimal memory layouts even larger speedups can be reported. Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb Reviewed By: fhahn, asb Subscribers: lsaba, asb, pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D37051 llvm-svn: 311647
* [LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess()Jonas Paulsson2017-08-091-5/+0
| | | | | | | | | | | | | | | | | isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously only isFoldableMemAccess() could do. The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess(). The isFoldableMemAccess() hook has been removed everywhere. Review: Quentin Colombet, Ulrich Weigand https://reviews.llvm.org/D35933 llvm-svn: 310463
* [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.Alexey Bataev2017-07-311-3/+3
| | | | llvm-svn: 309563
* [TTI] fixing a bug in the isLegalMaskedScatter APIMohammed Agabaria2017-07-271-1/+1
| | | | | | | | | isLegalMaskedScatter called the Gather version which is a bug. use test case is provided within the patch of AVX2 gathers at: https://reviews.llvm.org/D35772 Differential Revision: https://reviews.llvm.org/D35786 llvm-svn: 309260
* [SystemZ, LoopStrengthReduce]Jonas Paulsson2017-07-211-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
* [TTI] Refine the cost of EXT in getUserCost()Haicheng Wu2017-07-151-0/+5
| | | | | | | | | | | Now, getUserCost() only checks the src and dst types of EXT to decide it is free or not. This change first checks the types, then calls isExtFreeImpl(), and check if EXT can form ExtLoad at last. Currently, only AArch64 has customized implementation of isExtFreeImpl() to check if EXT can be folded into its use. Differential Revision: https://reviews.llvm.org/D34458 llvm-svn: 308076
* Extend memcpy expansion in Transform/Utils to handle wider operand types.Sean Fertile2017-07-071-0/+25
| | | | | | | | | | | Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
* [TargetTransformInfo, API] Add a list of operands to TTI::getUserCostEvgeny Astigeevich2017-06-291-2/+3
| | | | | | | | | | | | | | | | | | | | The changes are a result of discussion of https://reviews.llvm.org/D33685. It solves the following problem: 1. We can inform getGEPCost about simplified indices to help it with calculating the cost. But getGEPCost does not take into account the context which GEPs are used in. 2. We have getUserCost which can take the context into account but we cannot inform about simplified indices. With the changes getUserCost will have access to additional information as getGEPCost has. The one parameter getUserCost is also provided. Differential Revision: https://reviews.llvm.org/D34057 llvm-svn: 306674
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-281-2/+2
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* DivergencyAnalysis patch for reviewAlexander Timofeev2017-06-151-0/+4
| | | | llvm-svn: 305494
* [Atomics][LoopIdiom] Recognize unordered atomic memcpyAnna Thomas2017-06-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations. Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html Patch by Daniel Neilson! Reviewers: reames, anna, skatkov Reviewed By: reames, anna Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33243 llvm-svn: 304806
* Fix PR23384 (part 2 of 3) NFCEvgeny Stupachenko2017-06-051-0/+4
| | | | | | | | | | | | Summary: The patch moves LSR cost comparison to target part. Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30561 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304750
* [PPC] Inline expansion of memcmpZaara Syeda2017-05-311-0/+4
| | | | | | | | | | | | | | | This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313
* [LoopVectorizer] Let target prefer scalar addressing computations.Jonas Paulsson2017-05-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744
* [SLP] Enable 64-bit wide vectorization on AArch64Adam Nemet2017-05-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
* Add a late IR expansion pass for the experimental reduction intrinsics.Amara Emerson2017-05-101-0/+3
| | | | | | | | | This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631
* Introduce experimental generic intrinsics for horizontal vector reductions.Amara Emerson2017-05-091-0/+6
| | | | | | | | | | | | | | - This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 llvm-svn: 302514
OpenPOWER on IntegriCloud