summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* [NVPTX] Add fptosi tests to convert-fp.ll.Justin Lebar2017-01-151-48/+67
| | | | | | These seem to have been left off by accident. llvm-svn: 292071
* [NVPTX] Add codegen tests for llvm.fma.Justin Lebar2017-01-151-0/+25
| | | | llvm-svn: 292070
* [NVPTX] Modernize intrinics.ll test.Justin Lebar2017-01-151-13/+12
| | | | llvm-svn: 292069
* [NVPTX] Let there be One True Way to set NVVMReflect params.Justin Lebar2017-01-151-29/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously there were three ways to inform the NVVMReflect pass whether you wanted to flush denormals to zero: * An LLVM command-line option * Parameters to the NVVMReflect constructor * Metadata on the module itself. This change removes the first two, leaving only the third. The motivation for this change, aside from simplifying things, is that we want LLVM to be aware of whether it's operating in FTZ mode, so other passes can use this information. Ideally we'd have a target-generic piece of metadata on the module. This change moves us in that direction. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28700 llvm-svn: 292068
* [NVPTX] Added support for half-precision floating point.Artem Belevich2017-01-132-4/+1038
| | | | | | | | | | | | | | | | Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956
* [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is ↵Artem Belevich2017-01-133-0/+45
| | | | | | | | | | | | | | allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936
* [TM] Restore default TargetOptions in TargetMachine::resetTargetOptions.Justin Lebar2017-01-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously if you had * a function with the fast-math-enabled attr, followed by * a function without the fast-math attr, the second function would inherit the first function's fast-math-ness. This means that mixing fast-math and non-fast-math functions in a module was completely broken unless you explicitly annotated every non-fast-math function with "unsafe-fp-math"="false". This appears to have been broken since r176986 (March 2013), when the resetTargetOptions function was introduced. This patch tests the correct behavior as best we can. I don't think I can test FPDenormalMode and NoTrappingFPMath, because they aren't used in any backends during function lowering. Surprisingly, I also can't find any uses at all of LessPreciseFPMAD affecting generated code. The NVPTX/fast-math.ll test changes are an expected result of fixing this bug. When FMA is disabled, we emit add as "add.rn.f32", which prevents fma combining. Before this patch, fast-math was enabled in all functions following the one which explicitly enabled it on itself, so we were emitting plain "add.f32" where we should have generated "add.rn.f32". Reviewers: mkuper Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28507 llvm-svn: 291618
* [NVPTX] Add CHECK-LABEL where appropriate to fast-math.ll test.Justin Lebar2017-01-101-9/+4
| | | | | | | | Also fix up whitespace. Test-only change. llvm-svn: 291617
* [SelectionDAG] Correctly transform range metadata to AssertZExtDavid Majnemer2017-01-061-0/+18
| | | | | | | | We used the logBase2 of the high instead of the ceilLogBase2 resulting in the wrong result for certain values. For example, it resulted in an i1 AssertZExt when the exclusive portion of the range was 3. llvm-svn: 291196
* [NVVMIntrRange] Only set range metadata if none is already presentDavid Majnemer2016-12-221-0/+10
| | | | | | | The range metadata inserted by NVVMIntrRange is pessimistic, range metadata already present could be more precise. llvm-svn: 290294
* [IR] Remove the DIExpression field from DIGlobalVariable.Adrian Prantl2016-12-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades and a change to the Bitcode record for DIGlobalVariable, that makes upgrading the old format unambiguous also for variables without DIExpressions. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 290153
* Revert "[IR] Remove the DIExpression field from DIGlobalVariable."Adrian Prantl2016-12-161-3/+3
| | | | | | | | | | | | | | | | | This reverts commit 289920 (again). I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable has not DIExpression. Unfortunately it is not possible to safely upgrade these variables without adding a flag to the bitcode record indicating which version they are. My plan of record is to roll the planned follow-up patch that adds a unit: field to DIGlobalVariable into this patch before recomitting. This way we only need one Bitcode upgrade for both changes (with a version flag in the bitcode record to safely distinguish the record formats). Sorry for the churn! llvm-svn: 289982
* [IR] Remove the DIExpression field from DIGlobalVariable.Adrian Prantl2016-12-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289920
* Revert "[IR] Remove the DIExpression field from DIGlobalVariable."Adrian Prantl2016-12-161-1/+2
| | | | | | This reverts commit 289902 while investigating bot berakage. llvm-svn: 289906
* [IR] Remove the DIExpression field from DIGlobalVariable.Adrian Prantl2016-12-161-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289902
* Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.Justin Lebar2016-12-141-4/+0
| | | | llvm-svn: 289730
* [NVPTX] Support .maxnreg annotation.Justin Lebar2016-12-141-3/+13
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729
* [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.Justin Lebar2016-10-314-21/+15
| | | | | | | | | | | | | | | Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642
* [NVPTX] Compute 'rem' using the result of 'div', if possible.Justin Lebar2016-10-281-0/+112
| | | | | | | | | | | | | | | | | | | | | Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461
* [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.Artem Belevich2016-09-281-0/+187
| | | | | | | | These are only available on sm_60+ GPUs. Differential Revision: https://reviews.llvm.org/D24943 llvm-svn: 282607
* llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.NAKAMURA Takumi2016-09-211-1/+1
| | | | | | LLVM ERROR: Cannot select: 0x3607bf0: i32 = ExternalSymbol'__powidf2' llvm-svn: 282053
* [NVPTX] Check if callsite is defined when computing argument allignmentJacques Pienaar2016-09-211-0/+10
| | | | | | | | | | | | Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before. Reviewers: eliben, jpienaar Subscribers: jlebar, vchuravy, cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D9168 llvm-svn: 282045
* DebugInfo: New metadata representation for global variables.Peter Collingbourne2016-09-131-3/+2
| | | | | | | | | | | | | This patch reverses the edge from DIGlobalVariable to GlobalVariable. This will allow us to more easily preserve debug info metadata when manipulating global variables. Fixes PR30362. A program for upgrading test cases is attached to that bug. Differential Revision: http://reviews.llvm.org/D20147 llvm-svn: 281284
* [NVPTX] Use ldg for explicitly invariant loads.Justin Lebar2016-09-111-0/+27
| | | | | | | | | | | | | | | | | | Summary: With this change (plus some changes to prevent !invariant from being clobbered within llvm), clang will be able to model the __ldg CUDA builtin as an invariant load, rather than as a target-specific llvm intrinsic. This will let the optimizer play with these loads -- specifically, we should be able to vectorize them in the load-store vectorizer. Reviewers: tra Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc Differential Revision: https://reviews.llvm.org/D23477 llvm-svn: 281152
* [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.Justin Lebar2016-09-092-1/+262
| | | | | | | | | | | | | | | | | | | | Summary: Previously these only worked via NVPTX-specific intrinsics. This change will allow us to convert these target-specific intrinsics into the general LLVM versions, allowing existing LLVM passes to reason about their behavior. It also gets us some minor codegen improvements as-is, from situations where we canonicalize code into one of these llvm intrinsics. Reviewers: majnemer Subscribers: llvm-commits, jholewinski, tra Differential Revision: https://reviews.llvm.org/D24300 llvm-svn: 281092
* [NVPTX] Switch nvptx-use-infer-addrspace to true.Justin Lebar2016-08-192-3/+6
| | | | | | | | | | | | | | | Summary: This switches us to use a different, more powerful algorithm for address space inference. I've tested this locally and it seems to work great. Once we're more confident in it, we can remove the old pass altogether. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D23694 llvm-svn: 279317
* [NVPTX] Use untyped (.b) integer registers in PTX.Artem Belevich2016-08-121-0/+69
| | | | | | | | | | | | This bring LLVM-generated PTX closer to what nvcc generates and avoids triggering issues in ptxas. For instance, ptxas does not accept .s16 (or .u16) registers as operands for .fp16 instructions. Differential Revision: https://reviews.llvm.org/D23460 llvm-svn: 278568
* [NVPTX] remove unnecessary named metadata update that happens to break debug ↵Artem Belevich2016-08-021-0/+64
| | | | | | | | | | info. Also added test case to verify IR changes done by NVPTXGenericToNVVM pass. Differential Revision: https://reviews.llvm.org/D22837 llvm-svn: 277520
* Fix NVPTX/call-with-alloca-buffer.ll after r276777.Justin Lebar2016-07-261-4/+2
| | | | | | | r276777 makes InstSimplify stronger, letting it see through some unnecessary addrspace casts. llvm-svn: 276786
* [NVPTX] Enable the load-store vectorizer on nvptx.Justin Lebar2016-07-201-0/+17
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196
* [NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.Artem Belevich2016-07-201-1/+1
| | | | | | | | After r276153 the pass applies to both kernels and regular functions. Differential Revision: https://reviews.llvm.org/D22583 llvm-svn: 276189
* [NVPTX] deal with all aggregate return types.Artem Belevich2016-07-202-14/+43
| | | | | | | | Fixes a crash in llvm_unreachable when a function has array return type. Differential Revision: https://reviews.llvm.org/D22524 llvm-svn: 276154
* [NVPTX] Improve lowering of byval args of device functions.Artem Belevich2016-07-202-8/+26
| | | | | | | | | | | | Avoid unnecessary spills of byval arguments of device functions to local space on SASS level and subsequent pointer conversion to generic address space that follows. Instead, make a local copy in IR, provide a way to access arguments directly, and let LLVM optimize the copy away when possible. Differential Review: https://reviews.llvm.org/D21421 llvm-svn: 276153
* [NVPTX] Make sure we adjust alignment at all call sitesArtem Belevich2016-07-181-0/+11
| | | | | | | .. including calls from kernel functions that were ignored by mistake before. llvm-svn: 275920
* [NVPTX] Force minimum alignment of 4 for byval arguments of device-side ↵Artem Belevich2016-07-181-0/+8
| | | | | | | | | | | | | | | | functions. Taking address of a byval variable in PTX is legal, but currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042). Work around the issue by enforcing minimum alignment on byval arguments of device functions. The change is a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4. Differential Revision: https://reviews.llvm.org/D22428 llvm-svn: 275893
* NVPTX: Remove the legacy ptx intrinsicsJustin Bogner2016-07-072-95/+95
| | | | | | | | | | | | - Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but not all of these registers were already accessible via the nvvm name. - Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0. There's a fair amount of code motion here, but it's all very mechanical. llvm-svn: 274769
* [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.Justin Lebar2016-07-063-0/+15
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D22068 llvm-svn: 274674
* NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0Justin Bogner2016-07-064-14/+14
| | | | | | | Everywhere where cuda.syncthreads or __syncthreads is used, use the properly namespaced nvvm.barrier0 instead. llvm-svn: 274664
* Revert r273313 "[NVPTX] Improve lowering of byval args of device functions."Artem Belevich2016-06-292-26/+8
| | | | | | The change causes llvm crash in some unoptimized builds. llvm-svn: 274163
* Only emit extension for zeroext/signext arguments if type is < 32 bitsJustin Holewinski2016-06-271-0/+26
| | | | | | | | | | Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D21756 llvm-svn: 273922
* [NVPTX] Improve lowering of byval args of device functions.Artem Belevich2016-06-212-8/+26
| | | | | | | | | | | | | | | | | | | | | Avoid unnecessary spills of such vars to local space on SASS level and pointer space conversion. Instead, make a local copy with appropriate addrspacecasts and let LLVM optimize them away when possible. This allows loading value of the argument using [symbol+offset] instead of converting argument to general space pointer and using it for indexing (which also implicitly converts param space pointer to local space one on SASS level and triggers copying of argument into local space in the process). This reduces call overhead, uses less registers and reduces overall SASS size by 2-4%. Differential Review: http://reviews.llvm.org/D21421 llvm-svn: 273313
* [NVPTX] Add intrinsics for shfl instructions.Justin Lebar2016-06-091-0/+90
| | | | | | | | | | | | | | | Summary: Currently clang emits these instructions via inline (volatile) asm in the CUDA headers. Switching to intrinsics will let the optimizer reason across calls to these intrinsics. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D21160 llvm-svn: 272298
* [NVPTX] Added NVVMIntrRange passArtem Belevich2016-05-261-13/+54
| | | | | | | | | | | | NVVMIntrRange adds !range metadata to calls of NVVM intrinsics that return values within known limited range. This allows LLVM to generate optimal code for indexing arrays based on tid/ctaid which is a frequently used pattern in CUDA code. Differential Revision: http://reviews.llvm.org/D20644 llvm-svn: 270872
* [NVPTX] Fix sign/zero-extending ldg/ldu instruction selectionJustin Holewinski2016-05-021-0/+34
| | | | | | | | | | | | | | | | | Summary: We don't have sign-/zero-extending ldg/ldu instructions defined, so we need to emulate them with explicit CVTs. We were originally handling the i8 case, but not any other cases. Fixes PR26185 Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D19615 llvm-svn: 268272
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX ↵Justin Lebar2016-04-151-0/+24
| | | | | | target. llvm-svn: 266403
* [NVPTX] Handle ldg created from sign-/zero-extended loadJustin Holewinski2016-04-051-0/+57
| | | | | | | | | | Reviewers: jingyue Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D18053 llvm-svn: 265389
* [NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.Justin Lebar2016-04-011-0/+13
| | | | | | | | | | | | | | | | | | | Summary: Previously the NVVMReflect pass would read its configuration from command-line flags or a static configuration given to the pass at instantiation time. This doesn't quite work for clang's use-case. It needs to pass a value for __CUDA_FTZ down on a per-module basis. We use a module flag for this, so the NVVMReflect pass needs to be updated to read said flag. Reviewers: tra, rnk Subscribers: cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18672 llvm-svn: 265090
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-011-1/+1
| | | | llvm-svn: 265081
* [NVPTX] Adds a new address space inference pass.Jingyue Wu2016-03-201-10/+69
| | | | | | | | | | | | | | | | | | | Summary: The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable to convert the address space of a pointer induction variable. This patch adds a new pass called NVPTXInferAddressSpaces that overcomes that limitation using a fixed-point data-flow analysis (see the file header comments for details). The new pass is experimental and not enabled by default. Users can turn it on by setting the -nvptx-use-infer-addrspace flag of llc. Reviewers: jholewinski, tra, jlebar Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D17965 llvm-svn: 263916
OpenPOWER on IntegriCloud