summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/Utils
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/R600: Emit rodata in text segmentJan Vesely2020-02-031-1/+1
| | | | | | | | | | R600 relies on this behaviour. Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since 6e18266 Differential Revision: https://reviews.llvm.org/D72991 (cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
* [IR] Split out target specific intrinsic enums into separate headersReid Kleckner2019-12-111-2/+4
| | | | | | | | | | | | | | | | | | | | This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
* [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"Tom Stellard2019-11-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Most libraries are defined in the lib/ directory but there are also a few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining "Component Libraries" as libraries defined in lib/ that may be included in libLLVM.so. Explicitly marking the libraries in lib/ as component libraries allows us to remove some fragile checks that attempt to differentiate between lib/ libraries and tools/ libraires: 1. In tools/llvm-shlib, because llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of all libraries defined in the whole project, there was custom code needed to filter out libraries defined in tools/, none of which should be included in libLLVM.so. This code assumed that any library defined as static was from lib/ and everything else should be excluded. With this change, llvm_map_components_to_libnames(LIB_NAMES, "all") only returns libraries that have been added to the LLVM_COMPONENT_LIBS global cmake property, so this custom filtering logic can be removed. Doing this also fixes the build with BUILD_SHARED_LIBS=ON and LLVM_BUILD_LLVM_DYLIB=ON. 2. There was some code in llvm_add_library that assumed that libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or ARG_LINK_COMPONENTS set. This is only true because libraries defined lib lib/ use LLVMBuild.txt and don't set these values. This code has been fixed now to check if the library has been explicitly marked as a component library, which should now make it easier to remove LLVMBuild at some point in the future. I have tested this patch on Windows, MacOS and Linux with release builds and the following combinations of CMake options: - "" (No options) - -DLLVM_BUILD_LLVM_DYLIB=ON - -DLLVM_LINK_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON Reviewers: beanz, smeenai, compnerd, phosek Reviewed By: beanz Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70179
* [AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/storesPiotr Sobczak2019-11-202-1/+35
| | | | | | | | | | | | | | Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69794
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-192-3/+21
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* AMDGPU Reduce reported maximum group size to 1024Matt Arsenault2019-11-131-1/+2
| | | | | While some targets allow encoding 2048, this was never tested or supported.
* AMDGPU: Add default denormal mode to MachineFunctionInfoMatt Arsenault2019-11-011-4/+21
| | | | | | The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineFunctionInfo to help move towards this goal.
* AMDGPU: Simplify getAddressSpace callsMatt Arsenault2019-10-311-4/+5
| | | | | These can be directly taken from the GlobalValue instead of going through the type.
* Prune two MachineInstr.h includes, fix up depsReid Kleckner2019-10-191-0/+2
| | | | | | | | | | MachineInstr.h included AliasAnalysis.h, which includes a world of IR constructs mostly unneeded in CodeGen. Prune it. Same for DebugInfoMetadata.h. Noticed with -ftime-trace. llvm-svn: 375311
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-022-0/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in ↵Jay Foad2019-09-021-1/+1
| | | | | | | | | | | | | | | | | | | SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667
* AMDGPU: Disambiguate v3f16 format in load/store tablesMatt Arsenault2019-08-182-7/+7
| | | | | | | | | Currently the searchable tables report the number of dwords. These round to the same number for 3 and 4 component d16 instructions. Change this to report the number of elements so this isn't ambiguous. llvm-svn: 369202
* [AMDGPU] Fixed occupancy calculation for gfx10Stanislav Mekhanoshin2019-07-192-8/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
* [AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL messageDmitry Preobrazhensky2019-07-152-4/+9
| | | | | | | | Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-092-0/+20
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* [AMDGPU][MC] Corrected parsing of FLAT offset modifierDmitry Preobrazhensky2019-07-082-2/+2
| | | | | | | | | | | | | | Summary of changes: - simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset; - provided information about error position for pre-gfx9 targets; - improved errors handling. Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D64244 llvm-svn: 365321
* [AMDGPU][MC] Fix 2 for sanitizer failure in 364645Dmitry Preobrazhensky2019-06-282-6/+6
| | | | llvm-svn: 364656
* [AMDGPU][MC] Enabled constant expressions as operands of sendmsgDmitry Preobrazhensky2019-06-282-0/+150
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645
* [AMDGPU] Fix for branch offset hardware workaroundRyan Taylor2019-06-261-0/+3
| | | | | | | | | | | | | | | | | | | Summary: This fixes a hardware bug that makes a branch offset of 0x3f unsafe. This replaces the 32 bit branch with offset 0x3f to a 64 bit instruction that includes the same 32 bit branch and the encoding for a s_nop 0 to follow. The relaxer than modifies the offsets accordingly. Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63494 llvm-svn: 364451
* [AMDGPU] gfx1010 wave32 metadataStanislav Mekhanoshin2019-06-174-0/+35
| | | | | | Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577
* [AMDGPU] gfx1010 base changes for wave32Stanislav Mekhanoshin2019-06-132-11/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299
* [AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setregDmitry Preobrazhensky2019-06-132-0/+90
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61125 llvm-svn: 363255
* [AMDGPU] Optimize image_[load|store]_mipPiotr Sobczak2019-06-102-0/+10
| | | | | | | | | | | | | | | | | | Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957
* [AMDGPU] gfx1010 changes for PAL metadataStanislav Mekhanoshin2019-05-091-2/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D61704 llvm-svn: 360353
* [AMDGPU] gfx1010 MIMG implementationStanislav Mekhanoshin2019-05-011-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
* [AMDGPU] gfx1010 utility functionsStanislav Mekhanoshin2019-04-252-24/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D61094 llvm-svn: 359224
* [AMDGPU] gfx1010 sgpr register changesStanislav Mekhanoshin2019-04-241-2/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D61045 llvm-svn: 359117
* [AMDGPU] Add gfx1010 target definitionsStanislav Mekhanoshin2019-04-243-39/+66
| | | | | | Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113
* AMDGPU: Remove dx10-clamp from subtarget featuresMatt Arsenault2019-03-292-0/+52
| | | | | | | | | | | | | | | | | | Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-221-0/+4
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-0/+2
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AMDGPU] Do not generate spurious PAL metadataTim Renouf2019-03-202-6/+10
| | | | | | | | | | | | My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata" accidentally caused a spurious PAL metadata .note record to be emitted for any AMDGPU output. That caused failures in the lld test amdgpu-relocs.s. Fixed. Differential Revision: https://reviews.llvm.org/D59613 Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e llvm-svn: 356621
* [AMDGPU] Fix dependency on `BinaryFormat`Michael Liao2019-03-201-1/+1
| | | | | | | | | | | | Summary: - The linking is broken when this library is built as shared one. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59610 llvm-svn: 356617
* [AMDGPU] Added MsgPack format PAL metadataTim Renouf2019-03-202-39/+582
| | | | | | | | | | | | | | Summary: PAL metadata now supports both the old linear reg=val pairs format and the new MsgPack format. The MsgPack format uses YAML as its textual representation. On output to YAML, a mnemonic name is provided for some hardware registers. Differential Revision: https://reviews.llvm.org/D57028 Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94 llvm-svn: 356591
* [AMDGPU] Factored PAL metadata handling out into its own classTim Renouf2019-03-203-0/+284
| | | | | | | | | | | | | | | | | | | | | | Summary: This commit introduces a new AMDGPUPALMetadata class that: * is inside the AMDGPU target; * keeps an in-memory representation of PAL metadata; * provides a method to read the frontend-supplied metadata from LLVM IR; * provides methods for the asm printer to set metadata items; * provides methods to write the metadata as a binary blob to put in a .note record or as an asm directive; * provides a method to read the metadata as a binary blob from a .note record. Because llvm-readobj cannot call directly into a target, I had to remove llvm-readobj's ability to dump PAL metadata, pending a resolution to https://reviews.llvm.org/D52821 Differential Revision: https://reviews.llvm.org/D57027 Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64 llvm-svn: 356582
* [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, ↵Dmitry Preobrazhensky2019-03-041-0/+2
| | | | | | | | | | | | v_readlane_b32 and v_writelane_b32 See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58713 llvm-svn: 355312
* [AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of ↵Dmitry Preobrazhensky2019-02-272-0/+20
| | | | | | | | | | | | instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 llvm-svn: 354969
* AMDGPU: Remove GCN features and predicatesMatt Arsenault2019-02-081-1/+2
| | | | | | | These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-198-32/+24
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [AMDGPU] Extend the SI Load/Store optimizer to combine more things.Neil Henning2018-12-122-0/+61
| | | | | | | | | | I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision: https://reviews.llvm.org/D54042 llvm-svn: 348937
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-11-302-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050
* Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"David Stuttard2018-11-291-1/+0
| | | | | | | | | Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
* Add support for TFE/LWE in image intrinsicsDavid Stuttard2018-11-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
* AMDGPU/InsertWaitcnts: Untangle some semi-global stateNicolai Haehnle2018-11-292-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848
* [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/STRon Lieberman2018-11-161-0/+1
| | | | | | | | | Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008
* AMDHSA: More code object v3 fixes:Konstantin Zhuravlyov2018-11-151-1/+2
| | | | | | | | - Make sure IsaInfo::hasCodeObjectV3 returns true only for AMDHSA - Update assembler metadata tests to use v2 by default llvm-svn: 347001
* Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"Nicolai Haehnle2018-11-072-6/+6
| | | | | | | | This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364
* AMDGPU: Add sram-ecc featureKonstantin Zhuravlyov2018-11-052-0/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177
* AMDGPU: Rewrite SILowerI1Copies to always stay on SALUNicolai Haehnle2018-10-313-100/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
OpenPOWER on IntegriCloud