summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] SDWA: add disassembler support for GFX9Sam Kolton2017-05-265-31/+113
| | | | | | | | | | | | Summary: Added decoder methods and tests Reviewers: vpykhtin, artem.tamazov, dp Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33545 llvm-svn: 303999
* Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.Nico Weber2017-05-251-3/+1
| | | | llvm-svn: 303902
* [AMDGPU] add intrinsic for s_getpcTim Corringham2017-05-251-1/+3
| | | | | | | | | | | | | | Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859
* [AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.Nirav Dave2017-05-244-0/+24
| | | | | | | | | Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767
* Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex ↵Marek Olsak2017-05-244-18/+51
| | | | | | | | | | | patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754
* [AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)Simon Pilgrim2017-05-231-1/+1
| | | | | | | | This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045 Differential Revision: https://reviews.llvm.org/D33451 llvm-svn: 303691
* AMDGPU/SI: Move the local memory usage related checking after calling ↵Changpeng Fang2017-05-231-99/+114
| | | | | | | | | | | | | | | | | convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684
* [AMDGPU] Combine and (srl) into shl (bfe)Stanislav Mekhanoshin2017-05-233-11/+40
| | | | | | | | | | | | | | | | | | | Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681
* AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patternsMarek Olsak2017-05-234-51/+18
| | | | | | | | | | | | This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658
* [AMDGPU] Convert shl (add) into add (shl)Stanislav Mekhanoshin2017-05-232-2/+43
| | | | | | | | | | | shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641
* [AMDGPU] SDWA: Add assembler support for GFX9Sam Kolton2017-05-2313-64/+552
| | | | | | | | | | | | | | | Summary: Added separate pseudo and real instruction for GFX9 SDWA instructions. Currently supports only in assembler. Depends D32493 Reviewers: vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33132 llvm-svn: 303620
* [AMDGPU] Narrow lshl from 64 to 32 bit if possibleStanislav Mekhanoshin2017-05-221-11/+33
| | | | | | | | | Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569
* [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTrackerValery Pykhtin2017-05-222-62/+86
| | | | | | Differential revision: https://reviews.llvm.org/D33289 llvm-svn: 303548
* [AMDGPU][MC] Corrected disassembler to decode instructions with 2 literalsDmitry Preobrazhensky2017-05-192-4/+12
| | | | | | | | | | See bug 32922: https://bugs.llvm.org//show_bug.cgi?id=32922 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32912 llvm-svn: 303428
* [AMDGPU][MC] Fixed bugs in export instructionDmitry Preobrazhensky2017-05-192-10/+31
| | | | | | | | | | | | See Bugs 33019, 33056: https://bugs.llvm.org//show_bug.cgi?id=33019 https://bugs.llvm.org//show_bug.cgi?id=33056 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33288 llvm-svn: 303423
* [LegacyPassManager] Remove TargetMachine constructorsFrancis Visoiu Mistrih2017-05-1812-83/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360
* [AMDGPU] SDWA operands should not intersect with potential MIsSam Kolton2017-05-181-13/+32
| | | | | | | | | | | | | | | | | | | Summary: There should be no intesection between SDWA operands and potential MIs. E.g.: ``` v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0 v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0 v_add_u32 v3, v4, v2 ``` In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it. Reviewers: vpykhtin, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32804 llvm-svn: 303347
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-1722-116/+461
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Expand frame indexes to be relative to scratch wave offsetMatt Arsenault2017-05-171-6/+71
| | | | | | | | | | | | In order for an arbitrary callee to access an object in a caller's stack frame, the 32-bit offset used as the private pointer needs to be relative to the kernel's scratch wave offset register. Convert to this by finding the difference from the current stack frame and scaling by the wavefront size. llvm-svn: 303303
* AMDGPU: Change mubuf soffset register when SP relativeMatt Arsenault2017-05-172-15/+53
| | | | | | | | | | Check the MachinePointerInfo for whether the access is supposed to be relative to the stack pointer. No tests because this is used in later commits implementing calls. llvm-svn: 303301
* AMDGPU: Make better use of op_sel with high componentsMatt Arsenault2017-05-172-8/+57
| | | | | | Handle more general swizzles. llvm-svn: 303296
* AMDGPU: Try to use op_sel when selecting packed instructionsMatt Arsenault2017-05-171-1/+29
| | | | | | | | | | | | Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291
* AMDGPU: Use appropriate soffset for spillingMatt Arsenault2017-05-172-20/+20
| | | | | | | This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287
* AMDGPU: Fix min3/max3 combines for f16/i16Matt Arsenault2017-05-173-2/+25
| | | | | | Fix missing instruction definitions for min3/max3. llvm-svn: 303284
* [AMDGPU] Use GCNRPTracker dumper methods in schedulerStanislav Mekhanoshin2017-05-163-18/+21
| | | | | | Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186
* [AMDGPU] Cache live-ins and register pressure in schedulerStanislav Mekhanoshin2017-05-162-75/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184
* [AMDGPU] Turn register pressure estimation into forward trackerStanislav Mekhanoshin2017-05-164-135/+196
| | | | | | | | | | This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179
* AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable]NAKAMURA Takumi2017-05-162-2/+4
| | | | llvm-svn: 303137
* [AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI.Davide Italiano2017-05-151-5/+0
| | | | llvm-svn: 303122
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-157-12/+3245
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* Revert 303091.Jan Sjodin2017-05-157-3380/+12
| | | | llvm-svn: 303098
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-157-12/+3380
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64Dmitry Preobrazhensky2017-05-151-11/+22
| | | | | | | | | | See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070
* [AMDGPU][MC] Removed V_MQSAD_U16_U8Dmitry Preobrazhensky2017-05-151-3/+0
| | | | | | | | | | | | This instruction does not really exist See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018 Reviewers: vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D33126 llvm-svn: 303055
* AMDGPU/SI: Don't promote to vector if the load/store is volatile.Changpeng Fang2017-05-121-2/+5
| | | | | | | | | | | | | Summary: We should not change volatile loads/stores in promoting alloca to vector. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D33107 llvm-svn: 302943
* [KnownBits] Add bit counting methods to KnownBits struct and use them where ↵Craig Topper2017-05-121-1/+1
| | | | | | | | | | | | possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
* AMDGPU/GlobalISel: Mark 32-bit integer constants as legalTom Stellard2017-05-121-0/+1
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33115 llvm-svn: 302919
* [AMDGPU] Placate unused variable warning in release builds.Davide Italiano2017-05-111-0/+1
| | | | llvm-svn: 302821
* AMDGPU: Remove tfe bit from flat instruction definitionsMatt Arsenault2017-05-113-23/+22
| | | | | | | | | | We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814
* AMDGPU: Pull fneg out of extract_vector_eltMatt Arsenault2017-05-114-1/+31
| | | | | | | This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
* [AMDGPU] Fix incorrect register pressure calculationStanislav Mekhanoshin2017-05-111-2/+3
| | | | | | | | | Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812
* Remove now useless trailing nullptr in StructType::getSerge Guelton2017-05-111-1/+1
| | | | llvm-svn: 302779
* AMDGPU: Make some packed shuffles freeMatt Arsenault2017-05-102-1/+36
| | | | | | | VOP3P instructions can encode access to either half of the register. llvm-svn: 302730
* AMDGPU: Add new subtarget features for gfx9 flat instructionsMatt Arsenault2017-05-103-1/+38
| | | | | | | Flat instructions gain an immediate offset, and 2 new sets of segment specific flat instructions are added. llvm-svn: 302729
* [AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in ↵Dmitry Preobrazhensky2017-05-101-6/+12
| | | | | | | | | | | | disassembler output See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D32913 llvm-svn: 302648
* [AMDGPU] Fixed typo in GCNRegPressure, NFCStanislav Mekhanoshin2017-05-092-15/+15
| | | | | | VGRP -> VGPR, SGRP -> SGPR llvm-svn: 302586
* [RegisterBankInfo] Uniquely allocate instruction mapping.Quentin Colombet2017-05-052-47/+49
| | | | | | | | | | This is a step toward having statically allocated instruciton mapping. We are going to tablegen them eventually, so let us reflect that in the API. NFC. llvm-svn: 302316
* [AMDGPU] In the new waitcnt insertion pass, use getHeader Kannan Narayanan2017-05-051-5/+5
| | | | | | | | instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290
* AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0Konstantin Zhuravlyov2017-05-051-1/+2
| | | | | | | | This field is populated by the CP Differential Revision: https://reviews.llvm.org/D32619 llvm-svn: 302277
* [KnownBits] Add wrapper methods for setting and clear all bits in the ↵Craig Topper2017-05-051-1/+1
| | | | | | | | | | underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262
OpenPOWER on IntegriCloud