summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Select mul24 intrinsicsMatt Arsenault2019-12-301-2/+10
|
* AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftzMatt Arsenault2019-12-301-1/+5
|
* AMDGPU/GlobalISel: Fix mapping and selection of llvm.amdgcn.div.fixupMatt Arsenault2019-12-241-1/+5
|
* AMDGPU: Select basic interp directly from intrinsicsMatt Arsenault2019-10-211-12/+0
| | | | llvm-svn: 375457
* AMDGPU/GlobalISel: Select cvt pk intrinsicsMatt Arsenault2019-09-101-5/+25
| | | | llvm-svn: 371539
* AMDGPU/GlobalISel: Select llvm.amdgcn.sffbhMatt Arsenault2019-09-101-1/+5
| | | | llvm-svn: 371538
* AMDGPU/GlobalISel: Select llvm.amdgcn.classMatt Arsenault2019-09-091-1/+5
| | | | | | Also fixes missing SubtargetPredicate on f16 class instructions. llvm-svn: 371436
* AMDGPU/GlobalISel: Select fmed3Matt Arsenault2019-09-091-1/+5
| | | | llvm-svn: 371435
* AMDGPU: Use PatFrags to allow selecting custom nodes or intrinsicsMatt Arsenault2019-09-091-10/+39
| | | | | | | | | | | | | | | | | This enables GlobalISel to handle various intrinsics. The custom node pattern will be ignored, and the intrinsic will work. This will also allow SelectionDAG to directly select the intrinsics, but as they are all custom lowered to the nodes, this ends up leaving dead code in the table. Eventually either GlobalISel should add the equivalent of custom nodes equivalent, or intrinsics should be directly used. These each have different tradeoffs. There are a few more to handle, but these are easy to handle ones. Some others fail for other reasons. llvm-svn: 371432
* AMDGPU: Remove pointless wrapper nodes for init.exec intrinsicsMatt Arsenault2019-09-091-9/+0
| | | | llvm-svn: 371364
* AMDGPU: Remove unused custom node definitionMatt Arsenault2019-09-011-8/+0
| | | | llvm-svn: 370603
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-201-4/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* [AMDGPU] gfx1010 AMDGPUSetCCOp definitionStanislav Mekhanoshin2019-06-131-1/+1
| | | | | | | It was missing from D63293 and breaks in a debug tablegen w/o this part. llvm-svn: 363323
* [AMDGPU] gfx1010 allows VOP3 to have a literalStanislav Mekhanoshin2019-05-021-11/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756
* [AMDGPU] Support emitting GOT relocations for function callsScott Linder2019-02-041-3/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D57416 llvm-svn: 353083
* [AMDGPU] Add intrinsics for 16 bit interpolationTim Corringham2019-01-281-0/+11
| | | | | | | | | | | | | | | | | | | Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Remove PHI loop condition optimizationNicolai Haehnle2018-10-311-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
* AMDGPU: Add clamp bit to dot intrinsicsKonstantin Zhuravlyov2018-08-011-2/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470
* [AMDGPU] [AMDGPU] Support a fdot2 pattern.Farhana Aleen2018-07-161-0/+5
| | | | | | | | | | | | | | | Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-271-0/+2
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* [AMDGPU] DAG combine to produce V_PERM_B32Stanislav Mekhanoshin2018-06-121-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
* AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMPTom Stellard2018-05-241-2/+0
| | | | | | | | | | | | | | | | Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153
* AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}Marek Olsak2018-01-311-0/+8
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908
* Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.Wei Ding2017-10-121-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-111-0/+6
| | | | | | Handle the sibling call cases. llvm-svn: 310753
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-011-0/+16
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setccStanislav Mekhanoshin2017-06-211-0/+10
| | | | | | | | | This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-1/+1
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Add new amdgcn.init.exec intrinsicsMarek Olsak2017-04-281-0/+9
| | | | | | | | | | v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677
* AMDGPU: Move trap lowering to DAGMatt Arsenault2017-04-241-0/+5
| | | | | | | | | | | Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-0/+1
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-211-1/+5
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-0/+28
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Fix unnecessary ands when packing f16 vectorsMatt Arsenault2017-03-151-0/+2
| | | | | | | | | computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-241-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-0/+6
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-1/+1
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Remove dead node definitionsMatt Arsenault2017-02-151-10/+0
| | | | llvm-svn: 295247
* AMDGPU/R600: Serialize vector trunc stores to private ASJan Vesely2017-01-201-0/+3
| | | | | | | | | | | Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
* AMDGPU: Add replacement export intrinsicsMatt Arsenault2017-01-171-8/+9
| | | | llvm-svn: 292205
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-041-0/+4
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
* AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.Tom Stellard2016-12-071-0/+13
| | | | | | | | | | | | | | Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879
* AMDGPU: Refactor exp instructionsMatt Arsenault2016-12-051-0/+26
| | | | | | | | | | | | | | | Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-271-4/+11
| | | | llvm-svn: 279902
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-281-0/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-261-0/+5
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-191-0/+5
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
* AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32Matt Arsenault2016-07-181-0/+1
| | | | llvm-svn: 275871
* AMDGPU: Fix verifier errors in SILowerControlFlowMatt Arsenault2016-06-221-1/+4
| | | | | | | | | | | | | The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467
OpenPOWER on IntegriCloud