summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Remove VOP3Mods0Clamp0OModMatt Arsenault2020-01-071-12/+0
| | | | | Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.
* AMDGPU: Use ImmLeaf for inline immediate predicatesMatt Arsenault2020-01-061-0/+16
|
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-191-1/+6
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* Sink all InitializePasses.h includesReid Kleckner2019-11-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
* AMDGPU: Slightly restructure m0 init codeMatt Arsenault2019-10-211-13/+15
| | | | | | | This will allow using another operation to produce the glue in a future change. llvm-svn: 375447
* AMDGPU: Split flat offsets that don't fit in DAGMatt Arsenault2019-10-201-3/+80
| | | | | | | | | | We handle it this way for some other address spaces. Since r349196, SILoadStoreOptimizer has been trying to do this. This is after SIFoldOperands runs, which can change the addressing patterns. It's simpler to just split this earlier. llvm-svn: 375366
* AMDGPU: Relax 32-bit SGPR register classMatt Arsenault2019-10-181-1/+1
| | | | | | | | | | | Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267
* AMDGPU: Fix redundant setting of m0 for atomic load/storeMatt Arsenault2019-10-141-10/+7
| | | | | | | Atomic load/store would have their setting of m0 handled twice, which happened to be optimized out later. llvm-svn: 374801
* AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAGMatt Arsenault2019-10-111-10/+43
| | | | llvm-svn: 374495
* AMDGPU: Use SGPR_128 instead of SReg_128 for vregsMatt Arsenault2019-10-101-2/+2
| | | | | | | | | SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-021-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-011-1/+1
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* AMDGPU: Remove v0 workaround for DS_GWS_* instructionsMatt Arsenault2019-08-011-19/+1
| | | | | | | Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598
* [AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAGCarl Ritson2019-07-261-0/+6
| | | | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105
* [AMDGPU] Add llvm.amdgcn.softwqm intrinsicCarl Ritson2019-07-261-0/+21
| | | | | | | | | | | | | | | | | Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097
* AMDGPU: Don't rely on m0 being -1 for GWS offsetsMatt Arsenault2019-07-191-4/+6
| | | | | | | This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608
* AMDGPU: Use getTargetConstantMatt Arsenault2019-07-171-2/+2
| | | | | | Avoids creating an extra intermediate mov. llvm-svn: 366340
* [AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32Jay Foad2019-07-121-0/+10
| | | | | | | | | | | | | | | | | | Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904
* [AMDGPU] gfx908 mfma supportStanislav Mekhanoshin2019-07-111-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
* AMDGPU: Make AMDGPUPerfHintAnalysis an SCC passMatt Arsenault2019-07-051-1/+0
| | | | | | | | | | | | Add a string attribute instead of directly setting MachineFunctionInfo. This avoids trying to get the analysis in the MachineFunctionInfo in a way that doesn't work with the new pass manager. This will also avoid re-visiting the call graph for every single function. llvm-svn: 365241
* [AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commitAlexander Timofeev2019-07-021-1/+1
| | | | llvm-svn: 364952
* [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the ↵Alexander Timofeev2019-07-021-0/+18
| | | | | | | | | divergent loop and used outside Differential Revision: https://reviews.llvm.org/D63953 Reviewers: rampitec, nhaehnle, arsenm llvm-svn: 364950
* AMDGPU: Support GDS atomicsNicolai Haehnle2019-07-011-6/+11
| | | | | | | | | | | | | | | | | Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814
* AMDGPU: Add intrinsics for DS GWS semaphore instructionsMatt Arsenault2019-06-201-16/+58
| | | | llvm-svn: 363983
* AMDGPU: Treat undef as an inline immediateMatt Arsenault2019-06-201-1/+17
| | | | | | | This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941
* AMDGPU: Consolidate some getGeneration checksMatt Arsenault2019-06-191-5/+4
| | | | | | | | This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902
* AMDGPU: Undo sub x, c canonicalization for v2i16Matt Arsenault2019-06-191-26/+59
| | | | | | Should avoid regression from D62341 llvm-svn: 363899
* Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"Matt Arsenault2019-06-191-0/+84
| | | | | | | | This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870
* Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsSimon Pilgrim2019-06-191-85/+0
| | | | | | | | | There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797
* AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsMatt Arsenault2019-06-181-0/+85
| | | | | | | There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-161-5/+15
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* AMDGPU: Fix dropping memref for ds append/consumeMatt Arsenault2019-06-141-1/+3
| | | | | | | | | The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455
* AMDGPU: Fix input chain when gluing copies to m0Matt Arsenault2019-06-141-2/+5
| | | | | | | I don't think this was causing any observable issues, but was making reading the DAG dump confusing. llvm-svn: 363389
* AMDGPU: Refactor to prepare for manually selecting more intrinsicsMatt Arsenault2019-06-141-9/+19
| | | | llvm-svn: 363385
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-301-42/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-261-0/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-221-0/+2
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-211-0/+2
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AMDGPU] Fix clamp bit DAG operandMichael Liao2019-03-201-5/+8
| | | | | | | | | | | | | | | | | | Summary: - Should use `targetconstant` instead of `constant` operand for clamp bit, which is expected as an immediate operand. Under certain conditions, such as a common `i1 false` constant is used in other place and selected before the instruction with clamp bit, register operand may be added instead of immediate one. Use `targetcosntant` to enforce that. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59608 llvm-svn: 356608
* [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmeticTim Renouf2019-03-181-10/+22
| | | | | | | | | | | | | | Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
* [AMDGPU] Silence gcc 7 warningsStanislav Mekhanoshin2019-03-131-30/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D59330 llvm-svn: 356100
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-081-35/+174
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* AMDGPU: Add DS append/consume intrinsicsMatt Arsenault2019-01-281-15/+72
| | | | | | | | | | | | | | | Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. llvm-svn: 352422
* Codegen support for atomicrmw fadd/fsubMatt Arsenault2019-01-221-1/+1
| | | | llvm-svn: 351851
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Don't peel of the offset if the resulting base could possibly be ↵Changpeng Fang2018-12-211-3/+7
| | | | | | | | | | | | | | | | | | | negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision: https://reviews.llvm.org/D55568 llvm-svn: 349951
* AMDGPU: Avoid selecting ds_{read,write}2_b32 on SINicolai Haehnle2018-10-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
* [AMDGPU] Rename pass "isel" to "amdgpu-isel"Fangrui Song2018-10-031-2/+2
| | | | | | | | | | | | | | Summary: The AMDGPU target specific pass "isel" is a misleading name. Reviewers: tstellar, echristo, javed.absar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52759 llvm-svn: 343659
* [AMDGPU] Removed unused methodTim Renouf2018-09-131-22/+0
| | | | | | | | | | | | | Summary: I accidentally left this behind in D50306, and it causes a build warning when I build with gcc7. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52022 Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c llvm-svn: 342189
OpenPOWER on IntegriCloud