summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Merging r343369:Tom Stellard2018-12-061-1/+0
| | | | | | | | | | ------------------------------------------------------------------------ r343369 | vitalybuka | 2018-09-28 19:17:12 -0700 (Fri, 28 Sep 2018) | 1 line [cxx2a] Fix warning triggered by r343285 ------------------------------------------------------------------------ llvm-svn: 348450
* Merging r340959:Hans Wennborg2018-09-041-1/+5
| | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340959 | mareko | 2018-08-29 22:03:00 +0200 (Wed, 29 Aug 2018) | 9 lines AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51203 ------------------------------------------------------------------------ llvm-svn: 341351
* Merging r340417:Hans Wennborg2018-08-303-32/+36
| | | | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340417 | hakzsam | 2018-08-22 18:08:48 +0200 (Wed, 22 Aug 2018) | 14 lines AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 ------------------------------------------------------------------------ llvm-svn: 341041
* Merging r340416:Hans Wennborg2018-08-301-5/+5
| | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340416 | hakzsam | 2018-08-22 18:08:43 +0200 (Wed, 22 Aug 2018) | 8 lines AMDGPU: fix existing alias rules for constant and global Constant and global may alias, also one rules table wasn't ordered correctly. Pinpointed by Matt. v2: add a test with swapped parameters ------------------------------------------------------------------------ llvm-svn: 341040
* Merging r339190:Hans Wennborg2018-08-081-11/+0
| | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r339190 | jvesely | 2018-08-07 23:54:37 +0200 (Tue, 07 Aug 2018) | 12 lines AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 ------------------------------------------------------------------------ llvm-svn: 339235
* Merging r338610:Hans Wennborg2018-08-072-27/+69
| | | | | | | | | | | | ------------------------------------------------------------------------ r338610 | jvesely | 2018-08-01 20:36:07 +0200 (Wed, 01 Aug 2018) | 3 lines AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) ------------------------------------------------------------------------ llvm-svn: 339105
* Merging r338569:Hans Wennborg2018-08-072-9/+9
| | | | | | | | | | | | | | ------------------------------------------------------------------------ r338569 | jvesely | 2018-08-01 17:04:36 +0200 (Wed, 01 Aug 2018) | 5 lines AMDGPU: Allow fp32-denormals feature for r600 targets This was accidentally removed in r335942. Differential Revision: https://reviews.llvm.org/D49934 ------------------------------------------------------------------------ llvm-svn: 339103
* [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zeroRyan Taylor2018-08-014-2/+53
| | | | | | | | | | | | | | | Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523
* Revert "Enrich inline messages", tests failDavid Bolvansky2018-08-011-11/+6
| | | | llvm-svn: 338496
* Enrich inline messagesDavid Bolvansky2018-08-011-6/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338494
* AMDGPU: Add clamp bit to dot intrinsicsKonstantin Zhuravlyov2018-08-013-12/+33
| | | | | | Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470
* AMDGPU: Break 64-bit arguments into 32-bit piecesMatt Arsenault2018-07-311-3/+16
| | | | llvm-svn: 338421
* AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on callsMatt Arsenault2018-07-311-2/+24
| | | | | | | This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418
* AMDGPU: Scalarize vector argument types to callsMatt Arsenault2018-07-311-31/+15
| | | | | | | | | | | | | | | | | When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416
* Revert Enrich inline messagesDavid Bolvansky2018-07-311-11/+6
| | | | llvm-svn: 338389
* Enrich inline messagesDavid Bolvansky2018-07-311-6/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338387
* AMDGPU: Don't handle FP16_TO_FP in isCanonicalizedMatt Arsenault2018-07-311-4/+0
| | | | | | | This needs more special handling to do correctly. Fixes test in subsequent commit. llvm-svn: 338381
* AMDGPU: Fold undef fcanonicalize to qNaNMatt Arsenault2018-07-311-2/+10
| | | | | | | | | | We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376
* AMDGPU: Reduce code size with fcanonicalize (fneg x)Matt Arsenault2018-07-302-0/+11
| | | | | | | | When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
* AMDGPU: Make fneg combine handle fcanonicalizeMatt Arsenault2018-07-301-0/+2
| | | | llvm-svn: 338243
* AMDGPU: Force skip over s_sendmsg and exp instructionsNicolai Haehnle2018-07-303-20/+35
| | | | | | | | | | | | | | | | | | | | | Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235
* AMDGPU: Stop wasting argument registers with v3i32/v3f32Matt Arsenault2018-07-282-0/+59
| | | | | | | | | | SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-281-4/+3
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* AMDGPU: Stop trying to extend arguments for cloverMatt Arsenault2018-07-282-31/+1
| | | | | | | This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193
* AMDGPU/R600: Add MOV instructions to BFE patternsJan Vesely2018-07-271-5/+5
| | | | | | | | | R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127
* AMDGPU: Fix code size for return_to_epilog pseudoMatt Arsenault2018-07-272-3/+4
| | | | llvm-svn: 338113
* AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 typesTom Stellard2018-07-271-1/+1
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102
* [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bitsScott Linder2018-07-261-11/+16
| | | | | | | | | | Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060
* [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansionStanislav Mekhanoshin2018-07-251-13/+21
| | | | | | Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938
* AMDGPU/GlobalISel: Legalize G_INSERTTom Stellard2018-07-241-1/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798
* AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACTTom Stellard2018-07-241-3/+0
| | | | | | | | | | | | | | | | | | Summary: We were marking G_EXTRACT operations unsupported if the output type was larger than the input type. I don't see how this could ever actually happen, so I dropped the constraint. Doing this makes it possible to reuse the same legality code for G_INSERT. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49600 llvm-svn: 337794
* AMDGPU: Use existing function to check for VGPRsMatt Arsenault2018-07-201-16/+7
| | | | llvm-svn: 337621
* Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Matt Arsenault2018-07-2013-200/+219
| | | | | | Reverts r337079 with fix for msan error. llvm-svn: 337535
* [AMDGPU] [AMDGPU] Support a fdot2 pattern.Farhana Aleen2018-07-166-1/+88
| | | | | | | | | | | | | | | Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
* [AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" ↵Mark Searles2018-07-161-1/+1
| | | | | | | | | | build error" Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"" ( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via 2b2ee080f0164485562593b1b87291a48cea4a9a . llvm-svn: 337156
* run post-RA hazard recognizer pass lateMark Searles2018-07-162-4/+8
| | | | | | | | | | | | | Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154
* Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" ↵Mark Searles2018-07-161-1/+1
| | | | | | | | build error" This reverts commit fe0a456510131f268e388c4a18a92f575c0db183. llvm-svn: 337153
* Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Evgeniy Stepanov2018-07-1413-214/+193
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
* AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnumTom Stellard2018-07-132-0/+19
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46172 llvm-svn: 337056
* AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.expTom Stellard2018-07-132-0/+71
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45882 llvm-svn: 337046
* AMDGPU: Properly handle shader inputs with split argumentsMatt Arsenault2018-07-131-12/+27
| | | | | | | | | | This needs to refer to arguments by their original argument index, not the argument split index which depends on what the type splitting decides to do. Also avoid increment PSInputNum for each split piece. llvm-svn: 337022
* AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault2018-07-1313-193/+214
| | | | | | | | | | | | | | | | | This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
* AMDGPU: Fix assert in truncate combine with vectorsMatt Arsenault2018-07-121-1/+1
| | | | | | | The piece above probably has the same problem, but I need to try to come up with a test for it. llvm-svn: 336935
* AMDGPU/SI: Initialize InstrInfo before TargetLoweringInfo in GCNSubtargetTom Stellard2018-07-112-3/+3
| | | | | | | | SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo must be initialized first. This fixes msan buildbot failures and was introduced by r336851. llvm-svn: 336861
* AMDGPU: Remove duplicate call to initializeSubtargetDependencies()Tom Stellard2018-07-111-1/+0
| | | | | | This was added in r336851. llvm-svn: 336853
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-1174-381/+340
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU/NFC: Use already available explicit kernargKonstantin Zhuravlyov2018-07-111-1/+2
| | | | | | | size instead of calculating it again when filling out the metadata. llvm-svn: 336825
* Fix -Wmismatched-tags warningRichard Trieu2018-07-101-1/+1
| | | | | | class -> struct in forward declaration. llvm-svn: 336733
* [AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC)Scott Linder2018-07-105-2/+2
| | | | llvm-svn: 336722
* [AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC)Scott Linder2018-07-105-122/+148
| | | | | | | | Move all metadata construction into AMDGPUHSAMetadataStreamer. Differential Revision: https://reviews.llvm.org/D48176 llvm-svn: 336707
OpenPOWER on IntegriCloud