summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Set ADDE/ADDC/SUBE/SUBC to expand by defaultAmaury Sechet2018-06-011-4/+6
| | | | | | | | | | | | | | | Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 llvm-svn: 333748
* AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMPTom Stellard2018-05-241-24/+0
| | | | | | | | | | | | | | | | Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153
* AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLoweringTom Stellard2018-05-221-12/+0
| | | | | | | | | | | | | | Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-221-7/+8
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* AMDGPU: Custom lower v4i16/v4f16 vector operationsMatt Arsenault2018-05-161-0/+22
| | | | | | | | | Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453
* AMDGPU: Ignore any_extend in mul24 combineMatt Arsenault2018-05-091-0/+11
| | | | | | | | | | If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. llvm-svn: 331919
* AMDGPU: Handle partial shift reduction for variable shiftsMatt Arsenault2018-05-091-15/+22
| | | | | | | If the variable shift amount has known bits, we can still reduce the shift. llvm-svn: 331917
* AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bitMatt Arsenault2018-05-091-0/+30
| | | | | | | | | This is an extension of an existing combine to reduce wider shls if the result fits in the final result type. This introduces the same combine, but reduces the shift to a middle sized type to avoid the slow 64-bit shift. llvm-svn: 331916
* AMDGPU: Add combine for trunc of bitcast from build_vectorMatt Arsenault2018-05-091-0/+30
| | | | | | | | | | | | If the truncate is only accessing the first element of the vector, we can use the original source value. This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases. llvm-svn: 331909
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-2/+2
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* AMDGPU: Add Vega12 and Vega20Matt Arsenault2018-04-301-1/+2
| | | | | | | | Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215
* [AMDGPU] Fix issues for backend divergence trackingDavid Stuttard2018-04-181-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: A change to use divergence analysis in the AMDGPU backend was getting formal arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or VGPR2 For graphics shaders it is possible to have more than these passed in as VGPR Modified the checking code to check for any VGPR registers passed in as formal arguments. Also, some intrinsics that are sources of divergence may have been lowered during instruction selection and are missed on subsequent calls to isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well. Finally, the FunctionLoweringInfo tracks virtual registers that are live across basic block boundaries. This is used to check for divergence of CopyFromRegister registers using the DivergenceAnalysis analysis. For multiple blocks the lazily evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45372 Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3 llvm-svn: 330257
* Pass Divergence Analysis data to Selection DAG to drive divergenceAlexander Timofeev2018-03-051-0/+96
| | | | | | | | dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703
* AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}Marek Olsak2018-01-311-0/+4
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-291-1/+1
| | | | | | "to to" -> "to" llvm-svn: 323628
* AMDGPU/SI: Add d16 support for image intrinsics.Changpeng Fang2018-01-181-0/+77
| | | | | | | | | | | | | Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903
* [AMDGPU] add LDS f32 intrinsicsDaniil Fukalov2018-01-171-0/+3
| | | | | | | | | | | | added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics to allow generate ds_{add|min|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656
* AMDGPU/SI: Add d16 support for buffer intrinsics.Changpeng Fang2018-01-121-0/+4
| | | | | | | | | | Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-3/+3
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* DAG: Add nuw when splitting loads and storesMatt Arsenault2017-11-291-9/+3
| | | | | | | | | | | The object can't straddle the address space wrap around, so I think it's OK to assume any offsets added to the base object pointer can't overflow. Similar logic already appears to be applied in SelectionDAGBuilder when lowering aggregate returns. llvm-svn: 319272
* [AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsicsVedran Miletic2017-11-271-0/+30
| | | | | | | | | | | | | | AMDGPU backend errors with "unsupported call to function" upon encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch adds custom lowering to avoid that error on both R600 and SI. Reviewers: arsenm, jvesely Subscribers: tstellar Differential Revision: https://reviews.llvm.org/D29942 llvm-svn: 319025
* AMDGPU: Implement computeKnownBitsForTargetNode for mbcntMatt Arsenault2017-11-131-0/+14
| | | | llvm-svn: 318100
* AMDGPU: Drop duplicate setOperationActionJan Vesely2017-11-131-2/+0
| | | | | | | | These are set with other scalar int ops few lines up Differential Revision: https://reviews.llvm.org/D39928 llvm-svn: 318051
* AMDGPU: Lower buffer store and atomic intrinsics manuallyMarek Olsak2017-11-091-0/+13
| | | | | | | | | | | | | | Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754
* AMDGPU: Remove redundant combineMatt Arsenault2017-11-071-38/+0
| | | | | | | | | | | | | | | | | | | | This combine was already done in two places. The generic combiner already has done this since r217610, for adds (with a single use). This one was added in r303641, and added support for handling or as well. r313251 later added support to the generic combine for or. It also turns out the isOrEquivalentToAdd check is not necessary for this combine. Additionally, we already reproduce this combine in yet another place in the backend, although in that version multiple uses of the add are still folded if it will allow a fold into the addressing mode. That version needs to be improved to understand ors though, as well as the correct legal offsets for private. llvm-svn: 317526
* AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32Matt Arsenault2017-11-061-9/+20
| | | | llvm-svn: 317492
* AMDGPU : Fix an error for the llvm.cttz implementation.Wei Ding2017-10-171-3/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D39014 llvm-svn: 316037
* AMDGPU: Implement isFPExtFoldableMatt Arsenault2017-10-131-0/+11
| | | | | | This helps match v_mad_mix* in some cases. llvm-svn: 315744
* Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.Wei Ding2017-10-121-32/+68
| | | | | | Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610
* [AMDGPU] New 64 bit div/rem expansionStanislav Mekhanoshin2017-10-061-19/+151
| | | | | | | | | | | Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions. This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions. Passes OpenCL conformance test_integer_ops quick_[u]long_math Differential Revision: https://reviews.llvm.org/D38607 llvm-svn: 315081
* AMDGPU: Expand setcc for v2f32 and v4f32Konstantin Zhuravlyov2017-10-031-0/+1
| | | | llvm-svn: 314853
* AMDGPU: Expand setcc for v2i32 and v4i32Konstantin Zhuravlyov2017-10-031-0/+1
| | | | llvm-svn: 314852
* [AMDGPU] calling conventions for AMDPAL OS typeTim Renouf2017-09-291-0/+4
| | | | | | | | | | | | | | | Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502
* AMDGPU: Allow coldcc callsMatt Arsenault2017-09-111-0/+2
| | | | llvm-svn: 312936
* [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()Stanislav Mekhanoshin2017-09-011-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D37392 llvm-svn: 312364
* AMDGPU: Turn int pack pattern into build_vectorMatt Arsenault2017-08-311-1/+11
| | | | | | | | | | build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. llvm-svn: 312282
* [AMDGPU] computeKnownBitsForTargetNode for 24 bit mulStanislav Mekhanoshin2017-08-281-1/+31
| | | | | | Differential Revision: https://reviews.llvm.org/D37168 llvm-svn: 311896
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-111-0/+37
| | | | | | Handle the sibling call cases. llvm-svn: 310753
* AMDGPU: Don't use report_fatal_error for unsupported call typesMatt Arsenault2017-08-031-3/+9
| | | | llvm-svn: 310004
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-0/+43
| | | | llvm-svn: 309998
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-011-0/+1
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* fix typos in comments; NFCHiroshi Inoue2017-07-161-1/+1
| | | | llvm-svn: 308127
* AMDGPU: Return correct type during argument loweringMatt Arsenault2017-07-151-0/+30
| | | | | | | | | | | | | | | | | | | The type needs to be casted back to the original argument type. Fixes an assert that for some reason is only run when using -debug. Includes an additional combine to avoid test regressions from having conversions mixed with multiple Assert[SZ]ext nodes. On subtargets where i16 is legal, this was producing an i32 register with an i16 AssertZExt, truncated to i16 with another i8 AssertZExt. t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t3: i16 = truncate t2 t5: i16 = AssertZext t3, ValueType:ch:i8 t6: i8 = truncate t5 t7: i32 = zero_extend t6 llvm-svn: 308082
* Fix some more -Wimplicit-fallthrough warnings. NFCI.Simon Pilgrim2017-07-071-2/+5
| | | | llvm-svn: 307411
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-221-0/+2
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* AMDGPU: Cleanup CreateLiveInRegisterMatt Arsenault2017-06-191-7/+14
| | | | llvm-svn: 305748
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [AMDGPU] Combine and (srl) into shl (bfe)Stanislav Mekhanoshin2017-05-231-1/+2
| | | | | | | | | | | | | | | | | | | Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681
* [AMDGPU] Convert shl (add) into add (shl)Stanislav Mekhanoshin2017-05-231-2/+40
| | | | | | | | | | | shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641
OpenPOWER on IntegriCloud