summaryrefslogtreecommitdiffstats
path: root/llvm/test/Analysis/CostModel/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
* [PowerPC] Separate Features that are known to be Power9 specific from Future CPUStefan Pintilie2019-11-271-0/+16
| | | | | | | | The Power 9 CPU has some features that are unlikely to be passed on to future versions of the CPU. This patch separates this out so that future CPU does not inherit them. Differential Revision: https://reviews.llvm.org/D70466
* Recommit [PowerPC] Update P9 vector costs for insert/extractRoland Froese2019-08-261-24/+24
| | | | | | | Now that the v1i128 smin regression has been fixed, recommit the P9 cost updates from D60160. llvm-svn: 369952
* Revert Recommit [PowerPC] Update P9 vector costs for insert/extract elementJordan Rupprecht2019-07-011-24/+24
| | | | | | | | This reverts r364557 (git commit 9f7f5858fe46b8e706e87a83e2fd0a2678be619e) This crashes as reported on the commit thread. Repro instructions TBD. llvm-svn: 364876
* Recommit [PowerPC] Update P9 vector costs for insert/extract elementRoland Froese2019-06-271-24/+24
| | | | | | Recommit patch D60160 after regression fix patch D63463. llvm-svn: 364557
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-171-1/+0
| | | | llvm-svn: 363538
* Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract ↵David L. Jones2019-05-011-24/+24
| | | | | | | | element" This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313. llvm-svn: 359648
* [PowerPC] Update P9 vector costs for insert/extract elementRoland Froese2019-04-261-24/+24
| | | | | | | | | | | The PPC vector cost model values for insert/extract element reflect older processors that lacked vector insert/extract and move-to/move-from VSR instructions. Update getVectorInstrCost to give appropriate values for when the newer instructions are present. Differential Revision: https://reviews.llvm.org/D60160 llvm-svn: 359313
* [PowerPC] Add some PPC vec cost tests to prep for D60160 NFCRoland Froese2019-04-182-16/+172
| | | | llvm-svn: 358699
* [PowerPC] Update Vector Costs for P9Nemanja Ivanovic2019-01-261-0/+68
| | | | | | | | | | | | | For the power9 CPU, vector operations consume a pair of execution units rather than one execution unit like a scalar operation. Update the target transform cost functions to reflect the higher cost of vector operations when targeting Power9. Patch by RolandF. Differential revision: https://reviews.llvm.org/D55461 llvm-svn: 352261
* [PPC] Give unaligned memory access lower cost on processor that supports itGuozhi Wei2017-02-172-1/+27
| | | | | | | | | | | | Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost. This patch fixes pr31492. Differential Revision: https://reviews.llvm.org/D28630 This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug. llvm-svn: 295506
* Revert "[PPC] Give unaligned memory access lower cost on processor that ↵Daniel Jasper2017-01-252-27/+1
| | | | | | | | | | supports it" This reverts commit r292680. It is causing significantly worse performance and test timeouts in our internal builds. I have already routed reproduction instructions your way. llvm-svn: 293092
* [PPC] Give unaligned memory access lower cost on processor that supports itGuozhi Wei2017-01-202-1/+27
| | | | | | | | | | Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost. This patch fixes pr31492. Differential Revision: https://reviews.llvm.org/D28630 llvm-svn: 292680
* [ppc] Correctly compute the cost of loading 32/64 bit memory into VSRGuozhi Wei2016-12-031-0/+19
| | | | | | | | VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial. Differential Revision: https://reviews.llvm.org/D26713 llvm-svn: 288560
* [TTI] The cost model should not assume vector casts get completely scalarizedMichael Kuperstein2016-07-061-1/+1
| | | | | | | | | | | | | | | | The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
* [PowerPC] - Legalize vector types by widening instead of integer promotionNemanja Ivanovic2016-07-051-1/+1
| | | | | | | | | | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D20443 It changes the legalization strategy for illegal vector types from integer promotion to widening. This only applies for vectors with elements of width that is a multiple of a byte since we have hardware support for vectors with 1, 2, 3, 8 and 16 byte elements. Integer promotion for vectors is quite expensive on PPC due to the sequence of breaking apart the vector, extending the elements and reconstituting the vector. Two of these operations are expensive. This patch causes between minor and major improvements in performance on most benchmarks. There are very few benchmarks whose performance regresses. These regressions can be handled in a subsequent patch with a DAG combine (similar to how this patch handles int -> fp conversions of illegal vector types). llvm-svn: 274535
* [TTI] Let the cost model estimate ctpop costs based on legalityBenjamin Kramer2016-03-311-0/+11
| | | | | | | | | PPC has a vector popcount, this lets the vectorizer use the correct cost for it. Tweak X86 test to use an intrinsic that's actually scalarized (we have a somewhat efficient lowering for vector popcount using SSE, the cost model finds that now). llvm-svn: 265005
* [PowerPC] Include the permutation cost for unaligned vector loadsHal Finkel2015-09-032-13/+13
| | | | | | | | | | Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX types), even when accounting for the combining that takes place for multiple consecutive such loads, there is at least one load instructions and one permutation for each load. Make sure the cost reported reflects the cost of the permutes as well. llvm-svn: 246807
* [PowerPC] Cleanup cost model for unaligned vector loads/storesHal Finkel2015-09-022-1/+405
| | | | | | | | | | I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. llvm-svn: 246712
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector typesHal Finkel2014-09-161-0/+14
| | | | | | | | | | | | | | | | | | | The default implementation of getCmpSelInstrCost, which provides the cost of icmp/fcmp/select instructions, did not deal sensibly with illegal vector types that were scalarized. We'd ask for the legalization cost of the vector type, which would return something like (4, f64) given an input of <4 x double>, and we'd then check the TLI status of the ISD opcode on that scalar type. This would result in querying (ISD::VSELECT, f64), for example. Amusingly enough, ISD::VSELECT on scalar types is marked as Legal by default (as with most other operations), and most backends never change this because VSELECT is never generated on scalars. However, seeing the resulting operation as Legal, we'd neglect to add the scalarization cost before returning. The result is that we'd grossly under-estimate the cost of cmps/selects on illegal vector types. Now, if type legalization clearly results in scalarization, we skip the early return and add the scalarization cost. llvm-svn: 217859
* Reduce verbiage of lit.local.cfg filesAlp Toker2014-06-091-2/+1
| | | | | | We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496
* Don't assert in BasicTTI::getMemoryOpCost for non-simple typesHal Finkel2014-04-141-0/+3
| | | | | | | | BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for example, non-power-of-two vector types return non-simple EVTs (not MVT::Other). llvm-svn: 206150
* [PowerPC] Adjust load/store costs in PPCTTIHal Finkel2014-04-043-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. llvm-svn: 205658
* Account for scalarization costs in BasicTTI::getMemoryOpCost for extending ↵Hal Finkel2014-04-031-0/+5
| | | | | | | | | | | | | | | | | | | | vector loads When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) llvm-svn: 205495
* Fix multi-register costs in BasicTTI::getCastInstrCostHal Finkel2014-04-021-0/+21
| | | | | | | | | | | | | For an cast (extension, etc.), the currently logic predicts a low cost if the associated operation (keyed on the destination type) is legal (or promoted). This is not true when the number of values required to legalize the type is changing. For example, <8 x i16> being sign extended by <8 x i32> is not generically cheap on PPC with VSX, even though sign extension to v4i32 is legal, because two output v4i32 values are required compared to the single v8i16 input value, and without custom logic in the target, this conversion will scalarize. llvm-svn: 205487
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-161-2/+0
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* Refine fix to bug 15041.Bill Schmidt2013-02-081-0/+16
| | | | | | | | | Thanks to help from Nadav and Hal, I have a more reasonable (and even correct!) approach. This specifically penalizes the insertelement and extractelement operations for the performance hit that will occur on PowerPC processors. llvm-svn: 174725
* Initial implementation of PPCTargetTransformInfoHal Finkel2013-01-252-0/+40
This provides a place to add customized operation cost information and control some other target-specific IR-level transformations. The only non-trivial logic in this checkin assigns a higher cost to unaligned loads and stores (covered by the included test case). llvm-svn: 173520
OpenPOWER on IntegriCloud