summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Fix dropping mem operands when moving to VALUMatt Arsenault2015-08-291-0/+52
| | | | | | | | | | | | | Without a memory operand, mayLoad or mayStore instructions are treated as hasUnorderedMemRef, which results in much worse scheduling. We really should have a verifier check that any non-side effecting mayLoad or mayStore has a memory operand. There are a few instructions (interp and images) which I'm not sure what / where to add these. llvm-svn: 246356
* DI: Require subprogram definitions to be distinctDuncan P. N. Exon Smith2015-08-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' | grep -v test/Bitcode | xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327
* AMDGPU/SI: Add test for folding constants into operandsMatt Arsenault2015-08-271-0/+17
| | | | | | Patch by Axel Davy llvm-svn: 246167
* AMDGPU: Don't reprocess instructions when splitting i64 bcntMatt Arsenault2015-08-261-0/+19
| | | | llvm-svn: 246079
* AMDGPU: Fix not moving users of s_bfe_i64 to VALUMatt Arsenault2015-08-261-0/+50
| | | | | | | This wouldn't propagate to users of the original BFE and would hit a verifier error. llvm-svn: 246078
* AMDGPU: Produce error on dynamic_stackallocMatt Arsenault2015-08-261-0/+11
| | | | llvm-svn: 246048
* AMDGPU: Improve accuracy of instruction rates for some FP instructionsMatt Arsenault2015-08-222-6/+6
| | | | llvm-svn: 245774
* AMDGPU/SI: Better handle s_wait insertionTom Stellard2015-08-212-10/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
* AMDGPU/SI: Fix printing useless info with amdhsaMatt Arsenault2015-08-151-1/+2
| | | | | | | The comments at the bottom would all report 0 if amdhsa was used. llvm-svn: 245135
* AMDGPU: Fix assert on dbg_value instructionsMatt Arsenault2015-08-121-0/+37
| | | | llvm-svn: 244728
* AMDGPU: Add pass to lower OpenCL image and sampler arguments.Tom Stellard2015-08-073-0/+680
| | | | | | | | | The pass adds new kernel arguments for image attributes, and resolves calls to dummy attribute and resource id getter functions. Patch by: Zoltan Gilian llvm-svn: 244372
* AMDGPU: Assume SMRD access for constant address spaceMatt Arsenault2015-08-073-34/+286
| | | | | | | Since r243294 these are selected to SMRD and moved later if required. llvm-svn: 244354
* AMDGPU/SI: Add support for 32-bit immediate SMRD offsets on CITom Stellard2015-08-061-12/+17
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11604 llvm-svn: 244254
* AMDGPU/SI: Use ComplexPatterns for SMRD addressing modesTom Stellard2015-08-061-0/+56
| | | | | | | | | | | | Summary: This allows us to consolidate several of the TableGen patterns. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11602 llvm-svn: 244253
* AMDGPU/SI: Add implicit register operands in the correct order.Alex Lorenz2015-07-311-0/+16
| | | | | | | | | | | | | | | | | | This commit fixes a bug in the class 'SIInstrInfo' where the implicit register machine operands were added to a machine instruction in an incorrect order - the implicit uses were added before the implicit defs. I found this bug while working on moving the implicit register operand verification code from the MIR parser to the machine verifier. This commit also makes the method 'addImplicitDefUseOperands' in the machine instruction class public so that it can be reused in the 'SIInstrInfo' class. Reviewers: Matt Arsenault Differential Revision: http://reviews.llvm.org/D11689 llvm-svn: 243799
* AMDGPU: Fix v16i32 to v16i8 truncstoreMatt Arsenault2015-07-311-0/+48
| | | | llvm-svn: 243731
* AMDGPU: Don't try to use LDS/vector for private if pointer value storedMatt Arsenault2015-07-281-0/+52
| | | | | | | If the pointer is the store's value operand, this would produce a broken module. Make sure the use is actually for the pointer operand. llvm-svn: 243462
* AMDGPU: Fix crash if called function is a bitcastMatt Arsenault2015-07-281-0/+22
| | | | | | | getCalledFunction() is null, so this would crash. Replace crash with an error on unsupported call. llvm-svn: 243461
* AMDGPU: don't match vgpr loads for constant loadsMarek Olsak2015-07-272-16/+4
| | | | | | | | | | | | Author: Dave Airlie <airlied@redhat.com> In order to implement indirect sampler loads, we don't want to match on a VGPR load but an SGPR one for constants, as we cannot feed VGPRs to the sampler only SGPRs. this should be applicable for llvm 3.7 as well. llvm-svn: 243294
* AMDGPU/SI: Fix the V_FRACT_F64 SI bug workaroundMarek Olsak2015-07-271-6/+6
| | | | | | This is a candidate for 3.7. llvm-svn: 243263
* AMDGPU/SI: Add VI patterns to select FLAT instructions for global memory opsTom Stellard2015-07-202-85/+211
| | | | | | | | | | | | | | Summary: The MUBUF addr64 bit has been removed on VI, so we must use FLAT instructions when the pointer is stored in VGPRs. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11067 llvm-svn: 242673
* Only do fmul (fadd x, x), c combine if the fadd only has one useMatt Arsenault2015-07-171-0/+102
| | | | | | This was increasing the instruction count if the fadd has multiple uses. llvm-svn: 242498
* AMDPGU/SI: Negative offsets aren't allowed in MUBUF's vaddr operandTom Stellard2015-07-162-12/+41
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11226 llvm-svn: 242434
* AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32)Matt Arsenault2015-07-144-16/+82
| | | | | | | | | | | | | | | | | This can be done only with moves which theoretically will optimize better later. Although this transform increases the instruction count, it should be code size / cycle count neutral in the worst VALU case. It also seems to slightly improve a couple of testcases due to other DAG combines this exposes. This is probably slightly worse for the SALU case, so it might be better to handle this during moveToVALU, although then you lose some simplifications like the load width reducing in the simple testcase. llvm-svn: 242177
* AMDGPU/SI: Fix read2 merging into a super register.Matt Arsenault2015-07-146-15/+273
| | | | | | | | | | | | | | | | If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies, so we only ever write to a vreg_64. Run the register coalescer again to clean this up, although this isn't ideal and often does result in an extra move. Also remove the assert that offset1 > offset0. There isn't a real reason to not allow this other than a minor convenience in the compiler, and it doesn't seem worth the effort of avoiding it. llvm-svn: 242174
* AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructionsTom Stellard2015-07-147-89/+91
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11061 llvm-svn: 242146
* AMDGPU/SI: Select mad patterns to v_mac_f32Tom Stellard2015-07-137-40/+200
| | | | | | | | | The two-address instruction pass will convert these back to v_mad_f32 if necessary. Differential Revision: http://reviews.llvm.org/D11060 llvm-svn: 242038
* DAGCombiner: Assume invariant load cannot alias a storeMatt Arsenault2015-07-101-0/+35
| | | | | | | | | | The motivation is to allow GatherAllAliases / FindBetterChain to not give up on dependent loads of a pointer from constant memory. This is important for AMDGPU, because most loads are pointers derived from a load of a kernel argument from constant memory. llvm-svn: 241948
* AMDGPU/SI: Add debugging subtarget feature for DS offsetsMatt Arsenault2015-07-061-2/+3
| | | | | | | | We don't have a good way to detect most situations where DS offsets are usable on SI, so add an option to force using them even if unsafe for debugging performance problems. llvm-svn: 241462
* Test for specific output in lit testMatthias Braun2015-07-011-1/+18
| | | | llvm-svn: 241200
* RegisterCoalescer: Cleanup empty subranges after shrinkToUses()Matthias Braun2015-06-301-0/+20
| | | | | | | | A call to removeEmptySubranges() is necessary after every operation that potentially removes all segments from a subregister range; this case in the register coalescer was missing. llvm-svn: 241027
* AMDGPU/SI: Fix extra space when printing v_div_fmas_*Matt Arsenault2015-06-281-2/+2
| | | | llvm-svn: 240911
* AMDPGU/SI: Use correct resource descriptors for VI on HSATom Stellard2015-06-261-3/+8
| | | | | | | | | | Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D10777 llvm-svn: 240841
* AMDGPU/SI: Update amd_kernel_code_t definition and add assembler supportTom Stellard2015-06-261-4/+2
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10772 llvm-svn: 240839
* AMDGPU/SI: Set ELF OS/ABI to ELFOSABI_AMDGPU_HSATom Stellard2015-06-261-0/+1
| | | | | | | | | | Reviewers: arsenm, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10708 llvm-svn: 240832
* AMDGPU/SI: Add hsa code object directivesTom Stellard2015-06-261-0/+15
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10757 llvm-svn: 240831
* AMDGPU/SI: There are no implicit kernel args in the amdhsa ABITom Stellard2015-06-261-0/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10706 llvm-svn: 240830
* AMDGPU/SI: Emit amd_kernel_code_t in EmitFunctionBodyStart()Tom Stellard2015-06-261-1/+3
| | | | | | | | | | | | | | Summary: This way the function symbol points to the start of amd_kernel_code_t rather than the start of the function. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10705 llvm-svn: 240829
* AMDGPU: really don't commute REV opcodes if the target variant doesn't existMarek Olsak2015-06-261-0/+33
| | | | | | | | | | | | | | If pseudoToMCOpcode failed, we would return the original opcode, so operands would be swapped, but the instruction would remain the same. It resulted in LSHLREV a, b ---> LSHLREV b, a. This fixes Glamor text rendering and piglit/arb_sample_shading-builtin-gl-sample-mask on VI. This is a candidate for stable branches. v2: the test was simplified by Tom Stellard llvm-svn: 240824
* R600/SI: Use ELF64 format instead of ELF32Tom Stellard2015-06-221-1/+1
| | | | | | | | | | Reviewers: arsenm, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10392 llvm-svn: 240331
* R600: Use EM_AMDGPU for the ELF Machine typeTom Stellard2015-06-221-4/+5
| | | | | | | | | | Reviewers: arsenm, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10390 llvm-svn: 240330
* Fix "the the" in comments.Eric Christopher2015-06-191-1/+1
| | | | llvm-svn: 240112
* Revert "Revert "Fix merges of non-zero vector stores""Matt Arsenault2015-06-161-2/+99
| | | | | | | | Reapply r239539. Don't assume the collected number of stores is the same vector size. Just take the first N stores to fill the vector. llvm-svn: 239825
* R600 -> AMDGPU renameTom Stellard2015-06-13400-0/+45283
llvm-svn: 239657
OpenPOWER on IntegriCloud