summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Matt Arsenault2018-07-201-12/+12
| | | | | | Reverts r337079 with fix for msan error. llvm-svn: 337535
* Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Evgeniy Stepanov2018-07-141-12/+12
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
* AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault2018-07-131-12/+12
| | | | | | | | | | | | | | | | | This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-3/+3
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/ZKonstantin Zhuravlyov2018-06-211-3/+0
| | | | | | | | | | | | and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267
* [AMDGPU] Track occupancy in MFIStanislav Mekhanoshin2018-05-311-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629
* AMDGPU: Round up kernel argument allocation sizeMatt Arsenault2018-05-291-1/+4
| | | | | | | | | | AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. llvm-svn: 333456
* AMDGPU: Pass function directly instead of MachineFunctionMatt Arsenault2018-05-291-2/+2
| | | | | | | These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* AMDGPU: Fix not preserving CSR VGPR if used for SGPR spillsMatt Arsenault2018-03-271-4/+3
| | | | | | | | Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659
* Reapply "AMDGPU: Add 32-bit constant address space"Matt Arsenault2018-02-091-1/+7
| | | | | | This reverts r324494 and reapplies r324487. llvm-svn: 324747
* Revert "AMDGPU: Add 32-bit constant address space"Rafael Espindola2018-02-071-7/+1
| | | | | | | | This reverts commit r324487. It broke clang tests. llvm-svn: 324494
* AMDGPU: Add 32-bit constant address spaceMarek Olsak2018-02-071-1/+7
| | | | | | | | | | | | | | | | | | | | | | | Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
* AMDGPU: Use unique PSVs for buffer resourcesMatt Arsenault2017-12-291-1/+0
| | | | | | | Also fixes using the wrong memory type for some intrinsics when custom lowering them. llvm-svn: 321557
* AMDGPU: Implement getTgtMemIntrinsic for imagesMatt Arsenault2017-12-291-1/+0
| | | | | | | | | | | | | Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. llvm-svn: 321555
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-21/+21
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [AMDGPU] AMDPAL scratch buffer supportTim Renouf2017-09-291-1/+7
| | | | | | | | | | | | | | | | | | | | | | | Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501
* Add AddresSpace to PseudoSourceValue.Jan Sjodin2017-09-141-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297
* AMDGPU: trivial comment changeTim Renouf2017-09-111-1/+1
| | | | | | ... to check commit access for new committer. llvm-svn: 312900
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-25/+10
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310328
* AMDGPU: Fix implicitarg.ptr handling special inputsMatt Arsenault2017-08-031-1/+2
| | | | llvm-svn: 310002
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-45/+34
| | | | llvm-svn: 309998
* AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to itMatt Arsenault2017-08-021-2/+20
| | | | llvm-svn: 309783
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-281-1/+7
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* AMDGPU: Annotate necessity of flat-scratch-initMatt Arsenault2017-07-181-7/+8
| | | | | | | | As an approximation of the existing handling to avoid regressions. Fixes using too many registers with calls on subtargets with the SGPR allocation bug. llvm-svn: 308326
* AMDGPU: Figure out private memory regs after loweringMatt Arsenault2017-07-181-5/+8
| | | | | | | | | | | | | | | | | | Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325
* AMDGPU: Annotate features from x work item/group IDs.Matt Arsenault2017-07-171-13/+26
| | | | | | | | This wasn't necessary before since they are always enabled for kernels, but this is necessary if they need to be forwarded to a callable function. llvm-svn: 308226
* AMDGPU: Detect kernarg segment pointerMatt Arsenault2017-07-141-1/+4
| | | | | | | | This is necessary to pass the kernarg segment pointer to callee functions. Also don't unconditionally enable for kernels. llvm-svn: 307978
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-261-0/+1
| | | | llvm-svn: 306312
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-261-5/+5
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-7/+12
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5Marek Olsak2017-05-041-1/+7
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200
* AMDGPU: Add StackPtr and FramePtr registers to MFIMatt Arsenault2017-04-241-0/+2
| | | | | | These will be necessary for setting up call sequences. llvm-svn: 301208
* AMDGPU: Refactor SIMachineFunctionInfo slightlyMatt Arsenault2017-04-111-15/+25
| | | | | | Prepare for handling non-entry functions. llvm-svn: 299999
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-1/+1
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-211-42/+51
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* AMDGPU add support for spilling to a user sgpr pointed buffersTom Stellard2017-01-251-2/+13
| | | | | | | | | | | | | | | | | Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000
* AMDGPU/SI: Make a function constTom Stellard2016-12-201-1/+0
| | | | llvm-svn: 290185
* AMDGPU/SI: Add a MachineMemOperand to MIMG instructionsTom Stellard2016-12-201-0/+1
| | | | | | | | | | | | | | | Summary: Without a MachineMemOperand, the scheduler was assuming MIMG instructions were ordered memory references, so no loads or stores could be reordered across them. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27536 llvm-svn: 290179
* AMDGPU/SI: Add support for triples with the mesa3d operating systemTom Stellard2016-09-161-1/+1
| | | | | | | | | | | | | | Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 llvm-svn: 281779
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-14/+4
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU: Remove unused tracking of flat instructionsMatt Arsenault2016-08-111-1/+0
| | | | llvm-svn: 278361
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-4/+4
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU/SI: Don't use reserved VGPRs for SGPR spillingTom Stellard2016-07-281-1/+2
| | | | | | | | | | | | | | | Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
* AMDGPU: Make AMDGPUMachineFunction fields privateMatt Arsenault2016-07-261-3/+0
| | | | | | | | | ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
* AMDGPU: Add HSA dispatch id intrinsicMatt Arsenault2016-07-221-1/+11
| | | | llvm-svn: 276437
* AMDGPU/SI: Emit the number of SGPR and VGPR spillsMarek Olsak2016-07-131-0/+2
| | | | | | | | | | | | | | | | | | | | | Summary: v2: don't count SGPRs spilled to scratch twice I think this is sufficient. It doesn't count private memory usage, which happens often and uses scratch but isn't technically a spill. The private memory usage can be computed by: [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills]. The fact SGPR spills add very high numbers to the scratch size make that computation a guessing game, but I don't have a solution to that. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D22197 llvm-svn: 275288
* SIMachineFunctionInfo.cpp: Appease msc18 to use std::array.NAKAMURA Takumi2016-06-271-2/+2
| | | | llvm-svn: 273860
* Reformat.NAKAMURA Takumi2016-06-271-1/+1
| | | | llvm-svn: 273859
* Reformat blank lines.NAKAMURA Takumi2016-06-271-2/+0
| | | | llvm-svn: 273858
OpenPOWER on IntegriCloud