summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Switch backend default max workgroup size to 1024Matt Arsenault2019-11-131-2/+3
| | | | | | | | | | | | | Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
* AMDGPU: Enable code object v3 for AMDHSA onlyKonstantin Zhuravlyov2018-11-151-7/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923
* Revert r345542: AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-301-7/+7
| | | | | | It breaks mesa. llvm-svn: 345662
* AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-291-7/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-111-1/+3
| | | | | | | | | Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
* AMDGPU: Use more custom insert/extract_vector_elt loweringMatt Arsenault2018-06-051-2/+2
| | | | | | Apply to i8 vectors. llvm-svn: 334044
* AMDGPU: Fix broken dynamic vector indexing for packed typesMatt Arsenault2018-05-081-1/+1
| | | | | | The intention of this was to multiply by 16, not shift by 16. llvm-svn: 331793
* AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elementsChangpeng Fang2018-02-161-14/+15
| | | | | | | | | | | | | | Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-131-6/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-155/+155
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for ↵Mark Searles2017-12-191-1/+2
| | | | | | | | AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed. Differential Revision: https://reviews.llvm.org/D41377 llvm-svn: 321100
* [DAG] Do MergeConsecutiveStores again before Instruction SelectionNirav Dave2017-11-271-2/+1
| | | | | | | | | | | | | | | | Summary: Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well. Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33675 llvm-svn: 319036
* [AMDGPU][MC][GFX8][GFX9] Corrected names of integer ↵Dmitry Preobrazhensky2017-11-201-1/+1
| | | | | | | | | | | | v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675
* AMDGPU: Don't use MUBUF vaddr if address may overflowMatt Arsenault2017-11-151-8/+9
| | | | | | | Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240
* [AMDGPU] Generate range metadata for workitem idStanislav Mekhanoshin2017-04-121-10/+10
| | | | | | | | | If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-21/+21
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [AMDGPU] Add address space based alias analysis passStanislav Mekhanoshin2017-03-171-6/+6
| | | | | | | | | This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-6/+6
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* AMDGPU: Custom lower more vector operationsMatt Arsenault2017-01-231-3/+7
| | | | | | This avoids stack usage. llvm-svn: 292846
* AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and ↵Nicolai Haehnle2016-12-081-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048
* Reapply "AMDGPU: Don't use offen if it is 0"Matt Arsenault2016-10-261-7/+9
| | | | | | This reverts r283003 llvm-svn: 285203
* [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look ↵Bjorn Pettersson2016-10-051-1/+2
| | | | | | | | | | | | | | | | | through EXTRACT_VECTOR_ELT. Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple look-through of EXTRACT_VECTOR_ELT. It will compute the result based on the known bits (or known sign bits) for the vector that the element is extracted from. Reviewers: bogner, tstellarAMD, mkuper Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25007 llvm-svn: 283347
* Revert "AMDGPU: Don't use offen if it is 0"Mehdi Amini2016-10-011-9/+7
| | | | | | | This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003
* AMDGPU: Don't use offen if it is 0Matt Arsenault2016-10-011-7/+9
| | | | | | This removes many re-initializations of a base register to 0. llvm-svn: 282999
* AMDGPU: Run LoadStoreVectorizer pass by defaultMatt Arsenault2016-09-091-6/+6
| | | | llvm-svn: 281112
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-1/+1
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Remove dead check in AMDGPUPromoteAllocaMatt Arsenault2016-07-181-6/+29
| | | | | | | | | | This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869
* AMDGPU: Add feature for unaligned accessMatt Arsenault2016-07-011-4/+4
| | | | llvm-svn: 274398
* AMDGPU: Fix promote alloca for pointer loadsMatt Arsenault2016-05-181-3/+24
| | | | | | | If the load has a pointer type, we don't want to change its type. llvm-svn: 270000
* AMDGPU/R600: Use correct number of vector elements when lowering private loadsJan Vesely2016-05-161-0/+105
| | | | | | | | | | Reviewer: tstellardAMD, arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D20032 llvm-svn: 269725
* AMDGPU: Fix promote alloca pass creating huge arraysMatt Arsenault2016-05-161-15/+16
| | | | | | | | | | | | | | | This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708
* AMDGPU: Split private memory testsJan Vesely2016-05-111-0/+403
Reenable R600 testing reviewer: arsenm Differential Revision: http://reviews.llvm.org/D20031 llvm-svn: 269207
OpenPOWER on IntegriCloud