summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/ds_read2.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-24/+24
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.Alexander Timofeev2016-11-031-0/+40
| | | | | | | | | | | | | | | | | hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919
* AMDGPU: Remove superfluous string attributes from testsMatt Arsenault2016-07-111-1/+1
| | | | | | Also fix v_mac.ll not testing right thing for fneg llvm-svn: 275129
* AMDGPU: Remove some old intrinsic uses from testsMatt Arsenault2016-02-111-25/+25
| | | | llvm-svn: 260493
* AMDGPU: Switch barrier intrinsics to using convergentMatt Arsenault2015-12-191-2/+2
| | | | | | | | noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
* DAGCombiner: Combine extract_vector_elt from build_vectorMatt Arsenault2015-10-121-4/+2
| | | | | | | | | | | | | | This basic combine was surprisingly missing. AMDGPU legalizes many operations in terms of 32-bit vector components, so not doing this results in many extra copies and subregister extracts that need to be cleaned up later. InstCombine already does this for the hasOneUse case. The target hook is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn from a vector materialize repeated immediate instruction to a constant vector load with more scalar copies from it. llvm-svn: 250129
* AMDGPU/SI: Fix read2 merging into a super register.Matt Arsenault2015-07-141-2/+2
| | | | | | | | | | | | | | | | If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies, so we only ever write to a vreg_64. Run the register coalescer again to clean this up, although this isn't ideal and often does result in an extra move. Also remove the assert that offset1 > offset0. There isn't a real reason to not allow this other than a minor convenience in the compiler, and it doesn't seem worth the effort of avoiding it. llvm-svn: 242174
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+515
llvm-svn: 239657
OpenPOWER on IntegriCloud