summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecastsMatt Arsenault2016-12-101-1/+8
| | | | | | | The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-3/+1
| | | | llvm-svn: 283004
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-061-4/+5
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* Use the range variant of find instead of unpacking begin/endDavid Majnemer2016-08-111-1/+1
| | | | | | | | | If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433
* AMDGPU: Remove pointless dyn_cast_or_nullMatt Arsenault2016-07-181-4/+3
| | | | | | This is already casted above so non-null llvm-svn: 275881
* AMDGPU: Remove dead check in AMDGPUPromoteAllocaMatt Arsenault2016-07-181-9/+10
| | | | | | | | | | This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869
* AMDGPU: Remove dead code and redundant checkMatt Arsenault2016-07-181-27/+1
| | | | | | | Non intrinsic calls aren't really handled, and this IntrinsicInst dyn_cast checks for the function for us. llvm-svn: 275868
* AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.Nicolai Haehnle2016-07-181-0/+6
| | | | | | | | | | | | | | | | Summary: The work item intrinsics are not available for the shader calling conventions. And even if we did hook them up most shader stages haves some extra restrictions on the amount of available LDS. Reviewers: tstellarAMD, arsenm Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D20728 llvm-svn: 275779
* AMDGPU: Move subtarget feature checks into passesMatt Arsenault2016-06-271-2/+4
| | | | llvm-svn: 273937
* IR: Introduce local_unnamed_addr attribute.Peter Collingbourne2016-06-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709
* AMDGPU: Fix promote alloca for pointer loadsMatt Arsenault2016-05-181-3/+7
| | | | | | | If the load has a pointer type, we don't want to change its type. llvm-svn: 270000
* AMDGPU: Handle alloca promoting with null operandsMatt Arsenault2016-05-181-2/+37
| | | | | | | If the second pointer in a multi-pointer instruction is a constant, we can replace the type. llvm-svn: 269945
* AMDGPU: Fix promote alloca pass creating huge arraysMatt Arsenault2016-05-161-19/+86
| | | | | | | | | | | | | | | This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708
* AMDGPU: Fix breaking IR on instructions with multiple pointer operandsMatt Arsenault2016-05-121-8/+91
| | | | | | | | | | | | | The promote alloca pass would attempt to promote an alloca with a select, icmp, or phi user, even though the other operand was from a non-promotable source, producing a select on two different pointer types. Only do this if we know that both operands derive from the same alloca. In the future we should be able to relax this to an alloca which will also be promoted. llvm-svn: 269265
* AMDGPU: Fix mishandling array allocations when promoting allocaMatt Arsenault2016-04-281-1/+3
| | | | | | | | The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916
* AMDGPU: Account for globals in AMDGPUPromoteAlloca passMatt Arsenault2016-04-271-2/+4
| | | | | | Patch by Bas Nieuwenhuizen llvm-svn: 267791
* Add optimization bisect opt-in calls for AMDGPU passesAndrew Kaylor2016-04-251-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D19450 llvm-svn: 267485
* AMDGPU: allow specifying a workgroup size that needs to fit in a compute unitTom Stellard2016-04-141-5/+7
| | | | | | | | | | | | | | | | | | | Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
* AMDGPU: Promote alloca should skip volatilesMatt Arsenault2016-03-231-0/+13
| | | | llvm-svn: 264214
* AMDGPU: Don't use InstVisitor for AMDGPUPromoteAllocaMatt Arsenault2016-03-111-6/+12
| | | | | | | | Frontend authors are strongly encouraged to keep allocas in the entry block, so don't bother visiting every instruction in the other blocks of the function. llvm-svn: 263206
* AMDGPU: Remove a fixme for ptrrtoint handlingMatt Arsenault2016-03-071-1/+0
| | | | llvm-svn: 262854
* AMDGPU: Preserve alignments on new created globalsMatt Arsenault2016-02-051-2/+10
| | | | | | | Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911
* AMDGPU: Do not promote allocas with non-inbounds GEPsMatt Arsenault2016-02-021-0/+7
| | | | | | | | If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573
* AMDGPU: Handle promoting memmoveMatt Arsenault2016-02-021-0/+24
| | | | | | Also add missing tests for the others. llvm-svn: 259558
* AMDGPU: Skip promote alloca with no optimizationsMatt Arsenault2016-02-021-1/+1
| | | | llvm-svn: 259551
* AMDGPU: Minor cleanups for AMDGPUPromoteAllocaMatt Arsenault2016-02-021-27/+21
| | | | | | Mostly convert to use range loops. llvm-svn: 259550
* AMDGPU: Report AMDGPUPromoteAlloca changed the functionMatt Arsenault2016-02-021-22/+21
| | | | llvm-svn: 259547
* AMDGPU: Whitelist handled intrinsicsMatt Arsenault2016-02-021-8/+36
| | | | | | | We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546
* AMDGPU: Use inbounds when calculating workitem offsetMatt Arsenault2016-02-021-6/+7
| | | | | | | | | | | | | When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545
* AMDGPU: Fix emitting invalid workitem intrinsics for HSAMatt Arsenault2016-01-301-31/+177
| | | | | | | | | | | | | | | | | | The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297
* AMDGPU: Fix crash with invariant markersMatt Arsenault2016-01-221-0/+8
| | | | | | | | The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537
* GlobalValue: use getValueType() instead of getType()->getPointerElementType().Manuel Jacob2016-01-161-3/+2
| | | | | | | | | | | | Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999
* Revert "Change memcpy/memset/memmove to have dest and source alignments."Pete Cooper2015-11-191-3/+3
| | | | | | | | | | This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
* Change memcpy/memset/memmove to have dest and source alignments.Pete Cooper2015-11-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
* AMDGPU: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-131-1/+1
| | | | | | | | | | | | | | | | | | One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new one. Previously, bundle iterators and single-instruction iterators could be compared to each other (comparing on underlying pointers). I changed a comparison from using `MBB->end()` to using `MBB->instr_end()`, since both end iterators should point at the some place anyway. I don't think the implicit conversion between the two iterator types is a good idea since it's fairly easy to accidentally compare to the wrong thing (they aren't always end iterators). Otherwise I would have just added the conversion. Even with that, no there should be functionality change here. llvm-svn: 250218
* AMDGPU: Produce error on dynamic_stackallocMatt Arsenault2015-08-261-0/+3
| | | | llvm-svn: 246048
* De-constify pointers to Type since they can't be modified. NFCCraig Topper2015-08-011-3/+3
| | | | | | This was already done in most places a while ago. This just fixes the ones that crept in over time. llvm-svn: 243842
* AMDGPU: Don't try to use LDS/vector for private if pointer value storedMatt Arsenault2015-07-281-4/+14
| | | | | | | If the pointer is the store's value operand, this would produce a broken module. Make sure the use is actually for the pointer operand. llvm-svn: 243462
* AMDGPU: Fix crash if called function is a bitcastMatt Arsenault2015-07-281-1/+6
| | | | | | | getCalledFunction() is null, so this would crash. Replace crash with an error on unsupported call. llvm-svn: 243461
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+407
llvm-svn: 239657
OpenPOWER on IntegriCloud