summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Handle casts changing pointer size in the vectorizerStanislav Mekhanoshin2019-08-021-0/+39
| | | | | | | | | Added code to truncate or shrink offsets so that we can continue base pointer search if size has changed along the way. Differential Revision: https://reviews.llvm.org/D65612 llvm-svn: 367646
* Relax load store vectorizer pointer strip checksStanislav Mekhanoshin2019-08-011-6/+45
| | | | | | | | | | | The previous change to fix crash in the vectorizer introduced performance regressions. The condition to preserve pointer address space during the search is too tight, we only need to match the size. Differential Revision: https://reviews.llvm.org/D65600 llvm-svn: 367624
* [AMDGPU] Fix for vectorizer crash with pointers of different sizeStanislav Mekhanoshin2019-07-311-0/+18
| | | | | | | | | When vectorizer strips pointers it can eventually end up with pointers of two different sizes, then SCEV will crash. Differential Revision: https://reviews.llvm.org/D65480 llvm-svn: 367443
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-171-1/+0
| | | | llvm-svn: 363538
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-1720-0/+2538
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-1720-2538/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [PM] Port LoadStoreVectorizer to the new pass manager.Markus Lavin2018-12-077-1/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D54848 llvm-svn: 348570
* AMDGPU: Fix private handling for allowsMisalignedMemoryAccessesMatt Arsenault2018-09-241-17/+9
| | | | | | | | | | | | | If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879
* LSV: Fix adjust alloca alignment trick for AMDGPUMatt Arsenault2018-09-181-5/+81
| | | | | | | | | | This was checking the hardcoded address space 0 for the stack. Additionally, this should be checking for legality with the adjusted alignment, so defer the alignment check. Also try to split if the unaligned access isn't allowed. llvm-svn: 342442
* AMDGPU: Fix some outdated datalayouts in testsMatt Arsenault2018-09-1319-12/+28
| | | | llvm-svn: 342131
* [LSV] Look through selects for consecutive addressesRoman Tereshin2018-07-251-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | In some cases LSV sees (load/store _ (select _ <pointer expression> <pointer expression>)) patterns in input IR, often due to sinking and other forms of CFG simplification, sometimes interspersed with bitcasts and all-constant-indices GEPs. With this patch`areConsecutivePointers` method would attempt to handle select instructions. This leads to an increased number of successful vectorizations. Technically, select instructions could appear in index arithmetic as well, however, we don't see those in our test suites / benchmarks. Also, there is a lot more freedom in IR shapes computing integral indices in general than in what's common in pointer computations, and it appears that it's quite unreliable to do anything short of making select instructions first class citizens of Scalar Evolution, which for the purposes of this patch is most definitely an overkill. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49428 llvm-svn: 337965
* Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"Roman Tereshin2018-07-201-2/+51
| | | | | | | | This reapplies commit r337489 reverted by r337541 Additionally, this commit contains a speculative fix to the issue reported in r337541 (the report does not contain an actionable reproducer, just a stack trace) llvm-svn: 337606
* Revert "[LSV] Refactoring + supporting bitcasts to a type of different size"Sam McCall2018-07-201-19/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r337489. It causes asserts to fire in some TensorFlow tests, e.g. tensorflow/compiler/tests/gather_test.py on GPU. Example stack trace: Start test case: GatherTest.testHigherRank assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits" @ 0x5559446ebe10 __assert_fail @ 0x55593ef32f5e llvm::APInt::trunc() @ 0x55593d78f86e (anonymous namespace)::Vectorizer::lookThroughComplexAddresses() @ 0x55593d78f2bc (anonymous namespace)::Vectorizer::areConsecutivePointers() @ 0x55593d78d128 (anonymous namespace)::Vectorizer::isConsecutiveAccess() @ 0x55593d78c926 (anonymous namespace)::Vectorizer::vectorizeInstructions() @ 0x55593d78c221 (anonymous namespace)::Vectorizer::vectorizeChains() @ 0x55593d78b948 (anonymous namespace)::Vectorizer::run() @ 0x55593d78b725 (anonymous namespace)::LoadStoreVectorizer::runOnFunction() @ 0x55593edf4b17 llvm::FPPassManager::runOnFunction() @ 0x55593edf4e55 llvm::FPPassManager::runOnModule() @ 0x55593edf563c (anonymous namespace)::MPPassManager::runOnModule() @ 0x55593edf5137 llvm::legacy::PassManagerImpl::run() @ 0x55593edf5b71 llvm::legacy::PassManager::run() @ 0x55593ced250d xla::gpu::IrDumpingPassManager::run() @ 0x55593ced5033 xla::gpu::(anonymous namespace)::EmitModuleToPTX() @ 0x55593ced40ba xla::gpu::(anonymous namespace)::CompileModuleToPtx() @ 0x55593ced33d0 xla::gpu::CompileToPtx() @ 0x55593b26b2a2 xla::gpu::NVPTXCompiler::RunBackend() @ 0x55593b21f973 xla::Service::BuildExecutable() @ 0x555938f44e64 xla::LocalService::CompileExecutable() @ 0x555938f30a85 xla::LocalClient::Compile() @ 0x555938de3c29 tensorflow::XlaCompilationCache::BuildExecutable() @ 0x555938de4e9e tensorflow::XlaCompilationCache::CompileImpl() @ 0x555938de3da5 tensorflow::XlaCompilationCache::Compile() @ 0x555938c5d962 tensorflow::XlaLocalLaunchBase::Compute() @ 0x555938c68151 tensorflow::XlaDevice::Compute() @ 0x55593f389e1f tensorflow::(anonymous namespace)::ExecutorState::Process() @ 0x55593f38a625 tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()() *** SIGABRT received by PID 7798 (TID 7837) from PID 7798; *** llvm-svn: 337541
* [LSV] Refactoring + supporting bitcasts to a type of different sizeRoman Tereshin2018-07-191-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a preparation work for adding a limited support for select instructions. It proved to be difficult to do due to size and irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I believe. It also turned out that these changes make it simpler to finish one of the TODOs and fix a number of other small issues, namely: 1. Looking through bitcasts to a type of a different size (requires careful tracking of the original load/store size and some math converting sizes in bytes to expected differences in indices of GEPs). 2. Reusing partial analysis of pointers done by first attempt in proving them consecutive instead of starting from scratch. This added limited support for nested GEPs co-existing with difficult sext/zext instructions. This also required a careful handling of negative differences between constant parts of offsets. 3. Handing a case where the first pointer index is not an add, but something else (a function parameter for instance). I observe an increased number of successful vectorizations on a large set of shader programs. Only few shaders are affected, but those that are affected sport >5% less loads and stores than before the patch. Reviewed By: rampitec Differential-Revision: https://reviews.llvm.org/D49342 llvm-svn: 337489
* [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two ↵Farhana Aleen2018-07-191-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | pointers. Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or both of them are not factorized. But isConsecutiveAccess() fails if one of the pointers is factorized but the other one is not. Here is an example: PtrA = 4 * (A + B) PtrB = 4 + 4A + 4B This patch uses getMinusSCEV() to compute the distance between two pointers. getMinusSCEV() allows combining the expressions and computing the simplified distance. Author: FarhanaAleen Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49516 llvm-svn: 337471
* [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by ↵Farhana Aleen2018-05-282-4/+2
| | | | | | | | | | default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380
* [LoadStoreVectorize] Ignore interleaved invariant loads.Benjamin Kramer2018-04-241-0/+28
| | | | | | | | | The memory location an invariant load is using can never be clobbered by any store, so it's safe to move the load ahead of the store. Differential Revision: https://reviews.llvm.org/D46011 llvm-svn: 330725
* AMDGPU: enable 128-bit for local addr space under an optionMarek Olsak2018-04-102-2/+4
| | | | | | | | | | | | | | | | | | | Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764
* [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local ↵Farhana Aleen2018-03-092-4/+2
| | | | | | | | | | | | | | | | | | | | address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-022-153/+151
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [LSV] Avoid adding vectors of pointers as candidatesBjorn Pettersson2017-10-261-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We no longer add vectors of pointers as candidates for load/store vectorization. It does not seem to work anyway, but without this patch we can end up in asserts when trying to create casts between an integer type and the pointer of vectors type. The test case I've added used to assert like this when trying to cast between i64 and <2 x i16*>: opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. #0 PrintStackTraceSignalHandler(void*) #1 SignalHandler(int) #2 __restore_rt #3 __GI_raise #4 __GI_abort #5 __GI___assert_fail #6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*) #7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&) #8 Vectorizer::vectorizeStoreChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*) Reviewers: arsenm Reviewed By: arsenm Subscribers: nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D39296 llvm-svn: 316665
* Skip bitcasts while looking for GEP in LoadStoreVectorizerStanislav Mekhanoshin2017-04-251-0/+83
| | | | | | Differential Revisison: https://reviews.llvm.org/D32101 llvm-svn: 301343
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-2115-116/+116
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-232-6/+185
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* [LoadStoreVectorizer] Enable vectorization of stores in the presence of an ↵Alina Sbirlea2016-11-232-5/+61
| | | | | | | | | | | | | | | | | | aliasing load Summary: The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain. In practice, the aliasing load can be treated as a memory barrier and all stores that precede it are a valid vectorizable prefix. Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch. Reviewers: jlebar, arsenm, tstellarAMD Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D27008 llvm-svn: 287781
* AMDGPU/SI: Don't allow unaligned scratch accessTom Stellard2016-10-142-11/+13
| | | | | | | | | | | | Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257
* LSV: Fix incorrectly increasing alignmentMatt Arsenault2016-09-091-0/+129
| | | | | | | If the unaligned access has a dynamic offset, it may be odd which would make the adjusted alignment incorrect to use. llvm-svn: 281110
* [LSV] Use the original loads' names for the extractelement instructions.Justin Lebar2016-09-072-35/+35
| | | | | | | | | | | | | | | | Summary: LSV replaces multiple adjacent loads with one vectorized load and a bunch of extractelement instructions. This patch makes the extractelement instructions' names match those of the original loads, for (hopefully) improved readability. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D23748 llvm-svn: 280818
* [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail ↵Alina Sbirlea2016-08-301-0/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | positions. Resolves PR29148. Summary: LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize. A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head. e.g.: i1: store a[0] i2: store a[1] i3: store a[1] Leads to: H: i1 T: i2 i3 Instead of: H: i1 i1 T: i2 i3 So the positions for instructions that follow i3 will have different indexes in H/T. This patch resolves PR29148. This issue also surfaced the fact that if the chain is too long, and TLI returns a "not-fast" answer, the whole chain will be abandoned for vectorization, even though a smaller one would be beneficial. Added a testcase and FIXME for this. Reviewers: tstellarAMD, arsenm, jlebar Subscribers: mzolotukhin, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24057 llvm-svn: 280179
* [LSV] Don't assume that loads/stores appear in address order in the BB.Justin Lebar2016-07-201-0/+30
| | | | | | | | | | | | | | | | Summary: getVectorizablePrefix previously didn't work properly in the face of aliasing loads/stores. It unwittingly assumed that the loads/stores appeared in the BB in address order. If they didn't, it would do the wrong thing. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22535 llvm-svn: 276072
* [LSV] Insert stores at the right point.Justin Lebar2016-07-191-2/+29
| | | | | | | | | | | | | | | | | | | | | Summary: Previously, the insertion point for stores was the last instruction in Chain *before calling getVectorizablePrefixEndIdx*. Thus if getVectorizablePrefixEndIdx didn't return Chain.size(), we still would insert at the last instruction in Chain. This patch changes our internal API a bit in an attempt to make it less prone to this sort of error. As a result, we end up recalculating the Chain's boundary instructions, but I think worrying about the speed hit of this is a premature optimization right now. Reviewers: asbirlea, tstellarAMD Subscribers: mzolotukhin, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D22534 llvm-svn: 276056
* Correct ordering of loads/stores.Alina Sbirlea2016-07-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Aiming to correct the ordering of loads/stores. This patch changes the insert point for loads to the position of the first load. It updates the ordering method for loads to insert before, rather than after. Before this patch the following sequence: "load a[1], store a[1], store a[0], load a[2]" Would incorrectly vectorize to "store a[0,1], load a[1,2]". The correctness check was assuming the insertion point for loads is at the position of the first load, when in practice it was at the last load. An alternative fix would have been to invert the correctness check. The current fix changes insert position but also requires reordering of instructions before the vectorized load. Updated testcases to reflect the changes. Reviewers: tstellarAMD, llvm-commits, jlebar, arsenm Subscribers: mzolotukhin Differential Revision: http://reviews.llvm.org/D22071 llvm-svn: 275117
* Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizerAlina Sbirlea2016-07-111-14/+5
| | | | | | | | | | | | | Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains. Add additional parameters: AddressSpace, Alignment, Fast. Reviewers: llvm-commits, jlebar Subscribers: arsenm, mzolotukhin Differential Revision: http://reviews.llvm.org/D21935 llvm-svn: 275100
* Address two correctness issues in LoadStoreVectorizerAlina Sbirlea2016-07-011-3/+3
| | | | | | | | | | | | | | Summary: GetBoundryInstruction returns the last instruction as the instruction which follows or end(). Otherwise the last instruction in the boundry set is not being tested by isVectorizable(). Partially solve reordering of instructions. More extensive solution to follow. Reviewers: tstellarAMD, llvm-commits, jlebar Subscribers: escha, arsenm, mzolotukhin Differential Revision: http://reviews.llvm.org/D21934 llvm-svn: 274389
* LoadStoreVectorizer: Don't increase alignment with no align setMatt Arsenault2016-07-013-17/+60
| | | | | | | If no alignment was set on the load/stores, it would vectorize to the new type even though this increases the default alignment. llvm-svn: 274323
* LoadStoreVectorizer: Check TTI for vec reg bit widthMatt Arsenault2016-07-012-1/+53
| | | | llvm-svn: 274322
* LoadStoreVectorizer: Fix assert when merging pointer opsMatt Arsenault2016-07-011-0/+311
| | | | | | | This needs to use inttoptr/ptrtoint if combining an int and pointer load. If a pointer is used always do an integer load. llvm-svn: 274321
* LoadStoreVectorizer: Use AA metadataMatt Arsenault2016-07-011-0/+32
| | | | | | | This was not passing the full instruction with metadata to the alias query. llvm-svn: 274318
* LoadStoreVectorizer: if one element of a vector is integer, default toMatt Arsenault2016-07-011-1/+1
| | | | | | | | | | | | integer. Fixes issues on some architectures where we use arithmetic ops to build vectors, which can cause bad things to happen for loads/stores of mixed types. Patch by Fiona Glaser llvm-svn: 274307
* LoadStoreVectorizer: Fix crashes on sub-byte typesMatt Arsenault2016-07-011-0/+199
| | | | llvm-svn: 274306
* LoadStoreVectorizer: Check skipFunction first.Matt Arsenault2016-06-301-0/+10
| | | | | | Also add test I forgot to add to r274296. llvm-svn: 274299
* LoadStoreVectorizer: Skip optnone functionsMatt Arsenault2016-06-301-0/+12
| | | | llvm-svn: 274296
* Add LoadStoreVectorizer passMatt Arsenault2016-06-307-0/+989
This was contributed by Apple, and I've been working on minimal cleanups and generalizing it. llvm-svn: 274293
OpenPOWER on IntegriCloud