summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] gfx1010 premlane instructionsStanislav Mekhanoshin2019-06-127-1/+142
| | | | | | Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-125-17/+20
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.Simon Pilgrim2019-06-111-5/+6
| | | | | | As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048
* Revert CMake: Make most target symbols hidden by defaultTom Stellard2019-06-116-6/+6
| | | | | | | | | | | | | | | This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028
* AtomicExpand: Don't crash on non-0 allocaMatt Arsenault2019-06-111-0/+1
| | | | | | | This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022
* AMDGPU: Expand < 32-bit atomicsMatt Arsenault2019-06-111-0/+2
| | | | | | Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021
* CMake: Make most target symbols hidden by defaultTom Stellard2019-06-106-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990
* [AMDGPU] Optimize image_[load|store]_mipPiotr Sobczak2019-06-104-0/+43
| | | | | | | | | | | | | | | | | | Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957
* AMDGPU: Force skips around trapsMatt Arsenault2019-06-071-1/+1
| | | | llvm-svn: 362852
* [AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a ↵Valery Pykhtin2019-06-071-1/+15
| | | | | | | | caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789
* AMDGPU: Don't count mask branch pseudo towards skip thresholdMatt Arsenault2019-06-071-10/+8
| | | | llvm-svn: 362761
* AMDGPU: Insert skips for blocks with FLATMatt Arsenault2019-06-071-1/+2
| | | | | | | This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760
* AMDGPU: Insert skip branches over return blocksMatt Arsenault2019-06-062-3/+4
| | | | | | | | | | SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754
* [AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fddAlexander Timofeev2019-06-064-163/+107
| | | | | | | | | | | | "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749
* AMDGPU: Don't fix emergency stack slot at offset 0Matt Arsenault2019-06-052-26/+11
| | | | | | | | | | | | | | | | | | | | | This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-059-209/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* AMDGPU: Remove amdgpu-max-work-group-size attributeMatt Arsenault2019-06-051-10/+1
| | | | | | | This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641
* AMDGPU: Fix using 2 different enums for same operand flagsMatt Arsenault2019-06-053-11/+8
| | | | | | | These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640
* [SelectionDAG][x86] limit post-legalization store merging by typeSanjay Patel2019-06-041-1/+1
| | | | | | | | | | | The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507
* [Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() ↵Shawn Landden2019-06-041-1/+1
| | | | | | | | | | | | and countTrailingOnes() return unsigned This matches APInt's versions of these functions, and there is no need for these to be size_t. (as well as __builtin_clzll()) Differential Revision: https://reviews.llvm.org/D60823 llvm-svn: 362503
* AMDGPU: Disable stack realignment for kernelsMatt Arsenault2019-06-032-0/+14
| | | | | | | | | | | | | | | | | | | This is something of a workaround, and the state of stack realignment controls is kind of a mess. Ideally, we would be able to specify the stack is infinitely aligned on entry to a kernel. TargetFrameLowering provides multiple controls which apply at different points. The StackRealignable field is used during SelectionDAG, and for some reason distinct from this hook. StackAlignment is a single field not dependent on the function. It would probably be better to make that dependent on the calling convention, and the maximum value for kernels. Currently this doesn't really change anything, since the frame lowering mostly does its own thing. This helps avoid regressions in a future change which will rely more heavily on hasFP. llvm-svn: 362447
* TTI: Improve default costs for addrspacecastMatt Arsenault2019-06-032-3/+3
| | | | | | | | | | For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436
* [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operandsDmitry Preobrazhensky2019-06-037-24/+69
| | | | | | | | | | See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292 Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D62660 llvm-svn: 362400
* AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operandNicolai Haehnle2019-06-031-2/+1
| | | | | | | | | | | | | | | Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da Reviewers: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62720 Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e llvm-svn: 362390
* AMDGPU: Fix not adding ImplicitBufferPtr as a live-inMatt Arsenault2019-05-311-1/+4
| | | | | | Fixes missing test from r293000. llvm-svn: 362275
* [AMDGPU] Use InliningThresholdMultiplier for inline hintStanislav Mekhanoshin2019-05-311-1/+2
| | | | | | | | | | | | | | AMDGPU uses multiplier 9 for the inline cost. It is taken into account everywhere except for inline hint threshold. As a result we are penalizing functions with the inline hint making them less probable to be inlined than those without the hint. Defaults are 225 for a normal function and 325 for a function with an inline hint. Currently we have effective threshold 225 * 9 = 2025 for normal functions and just 325 for those with the hint. That is fixed by this patch. Differential Revision: https://reviews.llvm.org/D62707 llvm-svn: 362239
* AMDGPU/GlobalISel: Add wave scratch offset argumentMatt Arsenault2019-05-301-0/+42
| | | | | | Avoids crashing in PEI in a future change. llvm-svn: 362136
* [AMDGPU] Added target-specific attribute amdgpu-max-memory-clauseTim Renouf2019-05-301-1/+3
| | | | | | | | | | | | | | | | With LLPC, previous investigation has suggested that si-scheduler interacts badly with SiFormMemoryClauses on an XNACK target in some games. That needs further investigation in the future. In the meantime, this commit adds a target-specific attribute to allow us to disable SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC to use. Differential Revision: https://reviews.llvm.org/D62572 Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92 llvm-svn: 362127
* AMDGPU: Return address loweringAakanksha Patil2019-05-292-1/+27
| | | | | | | | The patch computes the return address for the current function. Differential revision: https://reviews.llvm.org/D59666 llvm-svn: 362001
* AMDGPU: Temporary drop s_mul_hi_i/u32 patternsKonstantin Zhuravlyov2019-05-281-6/+2
| | | | | | | | It introduces performance regressions in several applications. This has already been submitted downstream. llvm-svn: 361879
* [AMDGPU] Correct the handling of inlineasm output registers.Michael Liao2019-05-281-2/+1
| | | | | | | | | | | | | | | | Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868
* AMDGPU: Don't enable all lanes with non-CSR VGPR spillsMatt Arsenault2019-05-281-39/+49
| | | | | | | | If the only VGPRs used for SGPR spilling were not CSRs, this was enabling all laness and immediately restoring exec. This is the usual situation in leaf functions. llvm-svn: 361848
* [AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.Michael Liao2019-05-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | Summary: - Don't treat the use of a scalar register as `vreg_1` an VGPR usage. Otherwise, that promotes that scalar register into vector one, which breaks the assumption that scalar register holds the lane mask. - The issue is triggered in a complicated case, where if the uses of that (lane mask) scalar register is legalized firstly before its definition, e.g., due to the mismatch block placement and its topological order or loop. In that cases, the legalization of PHI introduces the use of that scalar register as `vreg_1`. Reviewers: rampitec, nhaehnle, arsenm, alex-t Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62492 llvm-svn: 361847
* [AMDGPU] Fix for the address sanitizer failure. Fixing typoAlexander Timofeev2019-05-271-1/+1
| | | | llvm-svn: 361776
* [AMDGPU] Fix for the address sanitizer failure caused by the ifollowing ↵Alexander Timofeev2019-05-271-1/+3
| | | | | | | | commit: 1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5 Divergence driven ISel. Assign register class for cross block values according to the divergence. llvm-svn: 361770
* [AMDGPU][MC] Enabled constant expressions as operands of s_waitcntDmitry Preobrazhensky2019-05-271-36/+28
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61017 llvm-svn: 361763
* [AMDGPU] Divergence driven ISel. Assign register class for cross block ↵Alexander Timofeev2019-05-265-109/+171
| | | | | | | | | | | | | | | | | | values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741
* [SimplifyCFG] back out all SwitchInst commitsShawn Landden2019-05-261-1/+1
| | | | | | | | They caused the sanitizer builds to fail. My suspicion is the change the countLeadingZeros(). llvm-svn: 361736
* [Support] make countLeadingZeros() and countTrailingZeros() return unsignedShawn Landden2019-05-261-1/+1
| | | | | | | | | This matches countLeadingOnes() and countTrailingOnes(), and APInt's countLeadingZeros() and countTrailingZeros(). (as well as __builtin_clzll()) llvm-svn: 361724
* Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for ↵Peter Collingbourne2019-05-255-171/+85
| | | | | | | | | | cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688
* AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spillsMatt Arsenault2019-05-241-26/+66
| | | | | | | If some lanes weren't active on entry to the function, this could clobber their VGPR values. llvm-svn: 361655
* AMDGPU: Boost inline threshold with addrspacecasted alloca argumentsMatt Arsenault2019-05-241-3/+4
| | | | | | | This was skipping GetUnderlyingObject for nonprivate addresses, but an alloca could also be found through an addrspacecast if it's flat. llvm-svn: 361649
* [AMDGPU] Divergence driven ISel. Assign register class for cross block ↵Alexander Timofeev2019-05-245-85/+171
| | | | | | | | | | | | | | values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644
* AMDGPU: Correct maximum possible private allocation sizeMatt Arsenault2019-05-234-28/+14
| | | | | | | | | | | | | | | | We were assuming a much larger possible per-wave visible stack allocation than is possible: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70 Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541
* AMDGPU/GlobalISel: Legality for integer min/maxMatt Arsenault2019-05-232-0/+30
| | | | llvm-svn: 361519
* Reverted r361134 because of a failing test left unattended for a long time.Galina Kistanova2019-05-221-1/+1
| | | | | | | | http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio Failing Tests (1): LLVM :: CodeGen/AMDGPU/regbank-reassign.mir llvm-svn: 361430
* AMDGPU: Move disassembler support check to constructorMatt Arsenault2019-05-221-5/+6
| | | | | | Don't check for unsupported targets for every instruction. llvm-svn: 361406
* MC: Allow getMaxInstLength to depend on the subtargetMatt Arsenault2019-05-225-9/+34
| | | | | | | | | | | | Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405
* [AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiersDmitry Preobrazhensky2019-05-221-34/+32
| | | | | | | | | | See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61012 llvm-svn: 361386
* AMDGPU: Assume calls read execMatt Arsenault2019-05-211-0/+4
| | | | llvm-svn: 361333
OpenPOWER on IntegriCloud