summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Fixed memory leak with inliner replacedStanislav Mekhanoshin2017-09-201-1/+3
| | | | | | Delete inliner before replacing it. llvm-svn: 313723
* AMDGPU: Move r600 only code into r600 only td fileMatt Arsenault2017-09-202-53/+54
| | | | llvm-svn: 313719
* [AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.llStanislav Mekhanoshin2017-09-201-1/+2
| | | | llvm-svn: 313718
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-205-50/+161
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-205-1/+218
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* AMDGPU: Cleanup load/store PatFragsMatt Arsenault2017-09-207-271/+244
| | | | | | Try to use a consistent naming scheme. llvm-svn: 313713
* AMDGPU: Match store d16_hi instructionsMatt Arsenault2017-09-204-18/+77
| | | | llvm-svn: 313712
* [AMDGPU] Prevent post-RA scheduler from breaking memory clausesStanislav Mekhanoshin2017-09-192-0/+58
| | | | | | | | | The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670
* AMDGPU: Run internalize symbols at -O0Matt Arsenault2017-09-191-21/+21
| | | | | | | | The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616
* AMDGPU: Start selecting s_xnor_{b32, b64}Konstantin Zhuravlyov2017-09-183-2/+48
| | | | | | Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
* Fix warnings in r313297.Jan Sjodin2017-09-142-5/+3
| | | | llvm-svn: 313302
* AMDGPU: Fix violating constant bus restrictionMatt Arsenault2017-09-141-4/+5
| | | | | | You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298
* Add AddresSpace to PseudoSourceValue.Jan Sjodin2017-09-144-4/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
* AMDGPU: Fix assert on alloca of array of structMatt Arsenault2017-09-141-6/+5
| | | | llvm-svn: 313282
* AMDGPU: Stop modifying SP in call sequencesMatt Arsenault2017-09-141-3/+3
| | | | | | | | | | | | | | | Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279
* AMDGPU: Make frame register caller preservedMatt Arsenault2017-09-142-10/+16
| | | | | | | | | | | | | Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-133-0/+16
| | | | llvm-svn: 313217
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-132-1/+40
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* AMDGPU: Handle coldcc in more placesMatt Arsenault2017-09-131-0/+2
| | | | | | Missed in r312936 llvm-svn: 313205
* AMDGPU: Allow coldcc callsMatt Arsenault2017-09-111-0/+2
| | | | llvm-svn: 312936
* [AMDGPU] Produce madak and madmk from the two-address passStanislav Mekhanoshin2017-09-111-0/+42
| | | | | | | | | | These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928
* [AMDGPU] exp should not be in WQM modeTim Renouf2017-09-111-1/+1
| | | | | | | | | | | | | | | A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915
* AMDGPU: trivial comment changeTim Renouf2017-09-111-1/+1
| | | | | | ... to check commit access for new committer. llvm-svn: 312900
* [AMDGPU] Remove unused function. NFCI.Davide Italiano2017-09-081-9/+0
| | | | llvm-svn: 312836
* AMDGPU: Start using !con operatorMatt Arsenault2017-09-081-14/+12
| | | | | | | | | | | | | We have a lot of operand definition work essentially producing every valid permutation of operands to workaround builiding operand lists based on the instruction features. Apparently tablegen already has a mostly undocumented operator to concat dags which simplies this. Convert one simple place to use this. The BUF instruction definitions have much more complicated logic that can be totally rewritten now. llvm-svn: 312822
* AMDGPU: Recompute scc livenessMatt Arsenault2017-09-081-1/+7
| | | | | | | | The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819
* AMDGPU: Start selecting v_mad_mix_f32Matt Arsenault2017-09-074-5/+105
| | | | llvm-svn: 312732
* AMDGPU: Handle non-temporal loads and storesKonstantin Zhuravlyov2017-09-071-23/+59
| | | | | | Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729
* AMDGPU: Handle more than one memory operand in SIMemoryLegalizerKonstantin Zhuravlyov2017-09-072-58/+145
| | | | | | Differential Revision: https://reviews.llvm.org/D37397 llvm-svn: 312725
* AMDGPU: Don't legalize i16 extloads to i32 with legal i16Matt Arsenault2017-09-073-1/+8
| | | | | | | Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699
* [AMDGPU] Use v_pk_max_f16 for fcanonicalizeStanislav Mekhanoshin2017-09-061-5/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676
* [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalizeStanislav Mekhanoshin2017-09-061-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660
* [AMDGPU] Fix shouldClusterMemOps to process flat loadsStanislav Mekhanoshin2017-09-061-0/+4
| | | | | | | | Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640
* AMDGPU: Make worst-case assumption about the wait states in inline assemblyNicolai Haehnle2017-09-061-1/+2
| | | | | | | | | | | | | | | | Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635
* [AMDGPU] Transform __read_pipe_* and __write_pipe_*Yaxun Liu2017-09-063-74/+377
| | | | | | | | | When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598
* AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]:Konstantin Zhuravlyov2017-09-051-143/+157
| | | | | | | | | - Refactor SIMemOpInfo's constructors - Allow construction of NotAtomic SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37396 llvm-svn: 312563
* AMDGPU: Fix not accounting for tail call resource usageMatt Arsenault2017-09-051-1/+2
| | | | | | | | If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561
* AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [2]:Konstantin Zhuravlyov2017-09-051-151/+174
| | | | | | | | | | - Make SIMemOpInfo a class - Add accessor methods to SIMemOpInfo - Move get*Info methods to SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37395 llvm-svn: 312541
* AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [1]:Konstantin Zhuravlyov2017-09-051-46/+50
| | | | | | | | | - Rename MemOpInfo -> SIMemOpInfo - Move SIMemOpInfo class out of SIMemoryLegalizer class Differential Revision: https://reviews.llvm.org/D37394 llvm-svn: 312540
* [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()Stanislav Mekhanoshin2017-09-011-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D37392 llvm-svn: 312364
* AMDGPU: Add ds_{read|write}_addtid_b32 definitionsMatt Arsenault2017-09-012-0/+13
| | | | llvm-svn: 312349
* AMDGPU: Add most d16 load/store instruction definitionsMatt Arsenault2017-09-015-15/+147
| | | | | | | Doesn't include the tied operand necessary for the loads, but is enough for the assembler to work. llvm-svn: 312347
* AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait statesNicolai Haehnle2017-09-011-4/+9
| | | | | | | | | | | | | | | Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337
* AMDGPU: Fold clamp modifier for packed instructionsMatt Arsenault2017-08-316-20/+73
| | | | llvm-svn: 312297
* [Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-311-4/+19
| | | | | | warnings; other minor fixes. Also affected in files (NFC). llvm-svn: 312289
* AMDGPU: Turn int pack pattern into build_vectorMatt Arsenault2017-08-312-1/+18
| | | | | | | | | | build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. llvm-svn: 312282
* AMDGPU: Don't assert in TTI with fp32 denorms enabledMatt Arsenault2017-08-311-3/+25
| | | | | | Also refine for f16 and rcp cases. llvm-svn: 312213
* AMDGPU: Use set for tracked registersMatt Arsenault2017-08-311-20/+23
| | | | | | | | | | | | | | | The majority of the time spent in the pass checking for the register reads. Rather than searching all of the defined registers for uses in each instruction, use a set of defined registers and check the operands of the instruction. This process still is algorithmically not great, but with the additional trick of skipping the analysis for addresses with one use, this brings one slow testcase into a reasonable range. llvm-svn: 312206
* AMDGPU: Correct operand types for v_mad_mix*Matt Arsenault2017-08-304-13/+37
| | | | | | | | | | | | These aren't really packed instructions, so the default op_sel_hi should be 0 since this indicates a conversion. The operand types are scalar values that behave similar to an f16 scalar that may be converted to f32. Doesn't change the default printing for op_sel_hi, just the parsing. llvm-svn: 312179
* AMDGPU: Don't look for DS merge candidates with one use addressMatt Arsenault2017-08-301-3/+10
| | | | | | | | | | | | | The merge is only possible if the base address register is the same for the two instructions. If there is only the one use, there's no point in doing an expensive forward scan checking for memory interference looking for a merge candidate. This gives a signficant improvement in one extreme testcase. The code to do the scan is still algorithmically terrible, so this is still the slowest pass in that example. llvm-svn: 312096
OpenPOWER on IntegriCloud