summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Properly handle shader inputs with split argumentsMatt Arsenault2018-07-131-12/+27
| | | | | | | | | | This needs to refer to arguments by their original argument index, not the argument split index which depends on what the type splitting decides to do. Also avoid increment PSInputNum for each split piece. llvm-svn: 337022
* AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault2018-07-1313-193/+214
| | | | | | | | | | | | | | | | | This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
* AMDGPU: Fix assert in truncate combine with vectorsMatt Arsenault2018-07-121-1/+1
| | | | | | | The piece above probably has the same problem, but I need to try to come up with a test for it. llvm-svn: 336935
* AMDGPU/SI: Initialize InstrInfo before TargetLoweringInfo in GCNSubtargetTom Stellard2018-07-112-3/+3
| | | | | | | | SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo must be initialized first. This fixes msan buildbot failures and was introduced by r336851. llvm-svn: 336861
* AMDGPU: Remove duplicate call to initializeSubtargetDependencies()Tom Stellard2018-07-111-1/+0
| | | | | | This was added in r336851. llvm-svn: 336853
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-1174-381/+340
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU/NFC: Use already available explicit kernargKonstantin Zhuravlyov2018-07-111-1/+2
| | | | | | | size instead of calculating it again when filling out the metadata. llvm-svn: 336825
* Fix -Wmismatched-tags warningRichard Trieu2018-07-101-1/+1
| | | | | | class -> struct in forward declaration. llvm-svn: 336733
* [AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC)Scott Linder2018-07-105-2/+2
| | | | llvm-svn: 336722
* [AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC)Scott Linder2018-07-105-122/+148
| | | | | | | | Move all metadata construction into AMDGPUHSAMetadataStreamer. Differential Revision: https://reviews.llvm.org/D48176 llvm-svn: 336707
* AMDGPU: Make hidden argument metadata consistent withKonstantin Zhuravlyov2018-07-102-31/+46
| | | | | | | | amdgpu-implicitarg-num-bytes attribute Differential Revision: https://reviews.llvm.org/D49096 llvm-svn: 336697
* Reapply "AMDGPU: Force inlining if LDS global address is used"Matt Arsenault2018-07-103-26/+95
| | | | | | This reverts commit r336623 llvm-svn: 336675
* Revert "AMDGPU: Force inlining if LDS global address is used"Vlad Tsyrklevich2018-07-103-95/+26
| | | | | | | This reverts commit r336587, it was causing test failures on the sanitizer bots. llvm-svn: 336623
* [AMDGPU][Waitcnt] fix "comparison of integers of different signs" build errorMark Searles2018-07-091-1/+1
| | | | | | | | | | | | | | | | | | Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com> Fixes the following building error: external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61: error: comparison of integers of different signs: 'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type' (aka 'int') and 'unsigned int' [-Werror,-Wsign-compare] BlockWaitcntProcessedSet.end(), &MBB) < Count)) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~ 1 error generated. Differential Revision: https://reviews.llvm.org/D49089 llvm-svn: 336588
* AMDGPU: Force inlining if LDS global address is usedMatt Arsenault2018-07-093-26/+95
| | | | | | | | | | These won't work for the forseeable future. These aren't allowed from OpenCL, but IPO optimizations can make them appear. Also directly set the attributes on functions, regardless of the linkage rather than cloning functions like before. llvm-svn: 336587
* AMDGPU: Fix UBSan error caused by r335942Tom Stellard2018-07-063-24/+21
| | | | | | | | | | | | | | Summary: Fixes PR38071. Reviewers: arsenm, dstenb Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48979 llvm-svn: 336448
* AMDGPU/GlobalISel: Implement custom kernel arg loweringMatt Arsenault2018-07-054-32/+40
| | | | | | | | | | | | | Avoid using allocateKernArg / AssignFn. We do not want any of the type splitting properties of normal calling convention lowering. For now at least this exists alongside the IR argument lowering pass. This is necessary to handle struct padding correctly while some arguments are still skipped by the IR argument lowering pass. llvm-svn: 336373
* [AMDGPU] Add VALU to V_INTERP InstructionsRyan Taylor2018-07-051-0/+1
| | | | | | | | | | | | Wait states are not properly being inserted after buffer_store for v_interp instructions. Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can check and insert the appropriate wait states when needed. Differential Revision: https://reviews.llvm.org/D48772 Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64 llvm-svn: 336339
* Implement strip.invariant.groupPiotr Padlewski2018-07-021-0/+2
| | | | | | | | | | | | | | | | Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073
* AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.Tom Stellard2018-06-301-2/+10
| | | | | | | | | | | | | | | | | | | Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 llvm-svn: 336041
* AMDGPU: Don't use struct type for argument layoutMatt Arsenault2018-06-294-50/+71
| | | | | | | | | | This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999
* [AMDGPU] Enable LICM in the BE pipelineStanislav Mekhanoshin2018-06-291-0/+1
| | | | | | | | | | This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 llvm-svn: 335988
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-2863-1508/+1854
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* [AMDGPU] Early expansion of 32 bit udiv/uremStanislav Mekhanoshin2018-06-281-4/+316
| | | | | | | | | | | | This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 llvm-svn: 335868
* [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16Stanislav Mekhanoshin2018-06-281-5/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D48677 llvm-svn: 335866
* AMDGPU: Remove MFI::ABIArgOffsetMatt Arsenault2018-06-287-33/+22
| | | | | | | | | | | | | | We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830
* AMDGPU: Error on calls from graphics shadersMatt Arsenault2018-06-281-0/+7
| | | | | | | | In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829
* AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS structMatt Arsenault2018-06-281-1/+2
| | | | | | Not sure how this wasn't noticed before. llvm-svn: 335828
* AMDGPU: Fix assert on aggregate type kernel argumentsMatt Arsenault2018-06-281-2/+4
| | | | | | | | | | Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-276-12/+45
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* AMDGPU/NFC: Fix typo in commentKonstantin Zhuravlyov2018-06-271-1/+1
| | | | llvm-svn: 335707
* AMDGPU: Silence unused warnings in waitcnt insertion pass in release buildKonstantin Zhuravlyov2018-06-261-1/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D48607 llvm-svn: 335669
* [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsicStanislav Mekhanoshin2018-06-261-0/+3
| | | | | | | | This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654
* AMDGPU: Add pass to lower kernel arguments to loadsMatt Arsenault2018-06-264-0/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
* AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptrMatt Arsenault2018-06-253-1/+29
| | | | | | | | | Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. llvm-svn: 335490
* AMDGPU: Remove commented out codeMatt Arsenault2018-06-251-2/+0
| | | | llvm-svn: 335486
* AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointersMatt Arsenault2018-06-251-1/+5
| | | | llvm-svn: 335485
* AMDGPU: Respect align argument parameterMatt Arsenault2018-06-252-10/+19
| | | | | | | | | | This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. llvm-svn: 335478
* [AMDGPU] Update includes for intrinsic changes :(Reid Kleckner2018-06-232-4/+4
| | | | llvm-svn: 335409
* [IR] Split Intrinsics.inc into enums and implementationsReid Kleckner2018-06-231-1/+2
| | | | | | | | | | | | | | | | | | | Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407
* AMDGPU: Add patterns for i32/i64 local atomic load/storeMatt Arsenault2018-06-224-1/+54
| | | | | | | | Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
* AMDGPU/GlobalISel: Default to using TableGen'd instruction selectorTom Stellard2018-06-221-7/+0
| | | | | | | | | | | | | | | | Summary: We can select all instructions that are marked as legal in a full piglit run, so now is a good time to make the TableGen'd instruction selector default for all opcodes. This is NFC for a full piglit run, which is why there are no tests. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48198 llvm-svn: 335319
* AMDGPU/GlobalISel: legalize and select 32-bit G_ASHRTom Stellard2018-06-224-0/+47
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318
* AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFPTom Stellard2018-06-224-0/+18
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316
* AMDGPU/GlobalISel: Implement select() for COPYTom Stellard2018-06-221-1/+4
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315
* AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEFTom Stellard2018-06-212-0/+16
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307
* AMDGPU: Remove ability to reserve VGPRs for debuggerKonstantin Zhuravlyov2018-06-216-50/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
* [AMDGPU] Update assembler for HSA Code Object v3Scott Linder2018-06-216-75/+698
| | | | | | | | | | | | | | Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281
* [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcntsScott Linder2018-06-211-0/+1
| | | | | | | | | | BlockWaitcntProcessedSet was not being cleared between calls, so it was producing incorrect counts in cases where MBB addresses happened to coincide across multiple calls. Differential Revision: https://reviews.llvm.org/D48391 llvm-svn: 335268
* AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/ZKonstantin Zhuravlyov2018-06-215-51/+0
| | | | | | | | | | | | and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267
OpenPOWER on IntegriCloud