summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCISanjay Patel2016-09-142-9/+8
| | | | llvm-svn: 281493
* [NVPTX] Use ldg for explicitly invariant loads.Justin Lebar2016-09-111-13/+22
| | | | | | | | | | | | | | | | | | Summary: With this change (plus some changes to prevent !invariant from being clobbered within llvm), clang will be able to model the __ldg CUDA builtin as an invariant load, rather than as a target-specific llvm intrinsic. This will let the optimizer play with these loads -- specifically, we should be able to vectorize them in the load-store vectorizer. Reviewers: tra Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc Differential Revision: https://reviews.llvm.org/D23477 llvm-svn: 281152
* [CodeGen] Split out the notions of MI invariance and MI dereferenceability.Justin Lebar2016-09-111-3/+6
| | | | | | | | | | | | | | | | | | | Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151
* [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.Justin Lebar2016-09-092-16/+132
| | | | | | | | | | | | | | | | | | | | Summary: Previously these only worked via NVPTX-specific intrinsics. This change will allow us to convert these target-specific intrinsics into the general LLVM versions, allowing existing LLVM passes to reason about their behavior. It also gets us some minor codegen improvements as-is, from situations where we canonicalize code into one of these llvm intrinsics. Reviewers: majnemer Subscribers: llvm-commits, jholewinski, tra Differential Revision: https://reviews.llvm.org/D24300 llvm-svn: 281092
* CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePassesMatthias Braun2016-08-244-6/+0
| | | | | | | | | | | | | | | | | | | | | | Re-apply this patch, hopefully I will get away without any warnings in the constructor now. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279602
* Revert r279564. It introduces undefined behavior (binding a reference to aRichard Smith2016-08-234-0/+6
| | | | | | | dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes -Werror builds (including several buildbots) to fail. llvm-svn: 279580
* CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePassesMatthias Braun2016-08-234-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Re-apply this commit with the deletion of a MachineFunction delegated to a separate pass to avoid use after free when doing this directly in AsmPrinter. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279564
* Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove ↵Matthias Braun2016-08-234-0/+6
| | | | | | | | | | MachineFunctionAnalysis => Enable (Machine)ModulePasses" Reverting while tracking down a use after free. This reverts commit r279502. llvm-svn: 279503
* CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePassesMatthias Braun2016-08-234-6/+0
| | | | | | | | | | | | | | | | | | | This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279502
* [NVPTX] Switch nvptx-use-infer-addrspace to true.Justin Lebar2016-08-192-4/+1
| | | | | | | | | | | | | | | Summary: This switches us to use a different, more powerful algorithm for address space inference. I've tested this locally and it seems to work great. Once we're more confident in it, we can remove the old pass altogether. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D23694 llvm-svn: 279317
* [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> froundMichael Kuperstein2016-08-182-7/+7
| | | | | | | | | | The names of the tablegen defs now match the names of the ISD nodes. This makes the world a slightly saner place, as previously "fround" matched ISD::FP_ROUND and not ISD::FROUND. Differential Revision: https://reviews.llvm.org/D23597 llvm-svn: 279129
* Replace a few more "fall through" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970
* [NVPTX] Use untyped (.b) integer registers in PTX.Artem Belevich2016-08-121-3/+21
| | | | | | | | | | | | This bring LLVM-generated PTX closer to what nvcc generates and avoids triggering issues in ptxas. For instance, ptxas does not accept .s16 (or .u16) registers as operands for .fp16 instructions. Differential Revision: https://reviews.llvm.org/D23460 llvm-svn: 278568
* Use the range variant of find instead of unpacking begin/endDavid Majnemer2016-08-111-4/+4
| | | | | | | | | If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433
* [NVPTX] remove unnecessary named metadata update that happens to break debug ↵Artem Belevich2016-08-021-36/+0
| | | | | | | | | | info. Also added test case to verify IR changes done by NVPTXGenericToNVVM pass. Differential Revision: https://reviews.llvm.org/D22837 llvm-svn: 277520
* [ConstnatFolding] Teach the folder how to fold ConstantVectorDavid Majnemer2016-07-291-7/+7
| | | | | | | | | | | A ConstantVector can have ConstantExpr operands and vice versa. However, the folder had no ability to fold ConstantVectors which, in some cases, was an optimization barrier. Instead, rephrase the folder in terms of Constants instead of ConstantExprs and teach callers how to deal with failure. llvm-svn: 277099
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-284-31/+31
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* [NVPTX] Enable the load-store vectorizer on nvptx.Justin Lebar2016-07-202-1/+11
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196
* [NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.Artem Belevich2016-07-204-21/+21
| | | | | | | | After r276153 the pass applies to both kernels and regular functions. Differential Revision: https://reviews.llvm.org/D22583 llvm-svn: 276189
* [NVPTX] deal with all aggregate return types.Artem Belevich2016-07-201-6/+6
| | | | | | | | Fixes a crash in llvm_unreachable when a function has array return type. Differential Revision: https://reviews.llvm.org/D22524 llvm-svn: 276154
* [NVPTX] Improve lowering of byval args of device functions.Artem Belevich2016-07-203-32/+45
| | | | | | | | | | | | Avoid unnecessary spills of byval arguments of device functions to local space on SASS level and subsequent pointer conversion to generic address space that follows. Instead, make a local copy in IR, provide a way to access arguments directly, and let LLVM optimize the copy away when possible. Differential Review: https://reviews.llvm.org/D21421 llvm-svn: 276153
* [NVPTX] Make sure we adjust alignment at all call sitesArtem Belevich2016-07-181-2/+1
| | | | | | | .. including calls from kernel functions that were ignored by mistake before. llvm-svn: 275920
* [NVPTX] Force minimum alignment of 4 for byval arguments of device-side ↵Artem Belevich2016-07-182-6/+23
| | | | | | | | | | | | | | | | functions. Taking address of a byval variable in PTX is legal, but currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042). Work around the issue by enforcing minimum alignment on byval arguments of device functions. The change is a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4. Differential Revision: https://reviews.llvm.org/D22428 llvm-svn: 275893
* [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, ↵Justin Lebar2016-07-151-32/+23
| | | | | | | | | | | | | | | | | | | | | | | getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592
* Rename AnalyzeBranch* to analyzeBranch*.Jacques Pienaar2016-07-152-6/+9
| | | | | | | | | | | | Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564
* NVPTX: Avoid implicit iterator conversions, NFCDuncan P. N. Exon Smith2016-07-083-22/+21
| | | | | | | | | | | | | | | | | Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr* in the NVPTX backend, mainly by preferring MachineInstr& over MachineInstr* when a pointer isn't nullable and using range-based for loops. There was one piece of questionable code in NVPTXInstrInfo::AnalyzeBranch, where a condition checked a pointer converted from an iterator for nullptr. Since this case is impossible (moreover, the code above guarantees that the iterator is valid), I removed the check when I changed the pointer to a reference. Despite that case, there should be no functionality change here. llvm-svn: 274931
* NVPTX: Remove the legacy ptx intrinsicsJustin Bogner2016-07-073-136/+86
| | | | | | | | | | | | - Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but not all of these registers were already accessible via the nvvm name. - Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0. There's a fair amount of code motion here, but it's all very mechanical. llvm-svn: 274769
* [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.Justin Lebar2016-07-061-1/+13
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D22068 llvm-svn: 274674
* NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0Justin Bogner2016-07-062-4/+1
| | | | | | | Everywhere where cuda.syncthreads or __syncthreads is used, use the properly namespaced nvvm.barrier0 instead. llvm-svn: 274664
* NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names consistentJustin Bogner2016-07-061-8/+8
| | | | | | | The intrinsics here use nvvm, but the builtins and tablegen variable names were using ptx. Stick to the modern names here. llvm-svn: 274662
* Use arrays or initializer lists to feed ArrayRefs instead of SmallVector ↵Benjamin Kramer2016-07-021-4/+1
| | | | | | | | where possible. No functionality change intended. llvm-svn: 274431
* Delete MCCodeGenInfo.Rafael Espindola2016-06-301-14/+0
| | | | | | | MC doesn't really care about CodeGen stuff, so this was just complicating target initialization. llvm-svn: 274258
* Revert r273313 "[NVPTX] Improve lowering of byval args of device functions."Artem Belevich2016-06-293-79/+15
| | | | | | The change causes llvm crash in some unoptimized builds. llvm-svn: 274163
* Only emit extension for zeroext/signext arguments if type is < 32 bitsJustin Holewinski2016-06-271-2/+2
| | | | | | | | | | Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D21756 llvm-svn: 273922
* [NVPTX] Improve lowering of byval args of device functions.Artem Belevich2016-06-213-15/+79
| | | | | | | | | | | | | | | | | | | | | Avoid unnecessary spills of such vars to local space on SASS level and pointer space conversion. Instead, make a local copy with appropriate addrspacecasts and let LLVM optimize them away when possible. This allows loading value of the argument using [symbol+offset] instead of converting argument to general space pointer and using it for indexing (which also implicitly converts param space pointer to local space one on SASS level and triggers copying of argument into local space in the process). This reduces call overhead, uses less registers and reduces overall SASS size by 2-4%. Differential Review: http://reviews.llvm.org/D21421 llvm-svn: 273313
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-125-26/+28
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* [NVPTX] Add intrinsics for shfl instructions.Justin Lebar2016-06-091-1/+42
| | | | | | | | | | | | | | | Summary: Currently clang emits these instructions via inline (volatile) asm in the CUDA headers. Switching to intrinsics will let the optimizer reason across calls to these intrinsics. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D21160 llvm-svn: 272298
* Apply most suggestions of clang-tidy's performance-unnecessary-value-paramBenjamin Kramer2016-06-086-11/+12
| | | | | | | Avoids unnecessary copies. All changes audited & pass tests with asan. No functional change intended. llvm-svn: 272190
* Avoid copies of std::strings and APInt/APFloats where we only read from itBenjamin Kramer2016-06-082-2/+2
| | | | | | | | As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126
* Apply clang-tidy's misc-move-constructor-init throughout LLVM.Benjamin Kramer2016-05-271-1/+2
| | | | | | No functionality change intended, maybe a tiny performance improvement. llvm-svn: 270997
* Avoid some copies by using const references.Benjamin Kramer2016-05-271-1/+1
| | | | | | | clang-tidy's performance-unnecessary-copy-initialization with some manual fixes. No functional changes intended. llvm-svn: 270988
* Init member structs in constructor.Artem Belevich2016-05-261-3/+9
| | | | | | | Fixes build error on windows where MSVC does not support list initialization inside member initializer list. llvm-svn: 270877
* [NVPTX] Added NVVMIntrRange passArtem Belevich2016-05-264-0/+159
| | | | | | | | | | | | NVVMIntrRange adds !range metadata to calls of NVVM intrinsics that return values within known limited range. This allows LLVM to generate optimal code for indexing arrays based on tid/ctaid which is a frequently used pattern in CUDA code. Differential Revision: http://reviews.llvm.org/D20644 llvm-svn: 270872
* [NVPTX] Don't (incorrectly) say that the NVVMReflect pass preserves all ↵Justin Lebar2016-05-251-3/+0
| | | | | | | | | | | | analyses. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D20585 llvm-svn: 270790
* Delete Reloc::Default.Rafael Espindola2016-05-183-13/+17
| | | | | | | | | | | | Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988
* SDAG: Implement Select instead of SelectImpl in NVPTXDAGToDAGISelJustin Bogner2016-05-132-213/+236
| | | | | | | | | | - Where we were returning a node before, call ReplaceNode instead. - Where we would return null to fall back to another selector, rename the method to try* and return a bool for success. Part of llvm.org/pr26808. llvm-svn: 269483
* Return a StringRef from getSection.Rafael Espindola2016-05-111-1/+1
| | | | | | This is similar to how getName is handled. llvm-svn: 269218
* CodeGen: Move TargetPassConfig from Passes.h to an own header; NFCMatthias Braun2016-05-101-0/+1
| | | | | | | | Many files include Passes.h but only a fraction needs to know about the TargetPassConfig class. Move it into an own header. Also rename Passes.cpp to TargetPassConfig.cpp while we are at it. llvm-svn: 269011
* [NVPTX] Change begin/end inline asm comments to "begin/end inline asm".Justin Lebar2016-05-101-2/+2
| | | | | | | Previously it was just "// inline asm", which made it tricky to read code with lots of inline assembly. llvm-svn: 268994
* SDAG: Rename Select->SelectImpl and repurpose Select as returning voidJustin Bogner2016-05-052-3/+3
| | | | | | | | | | | | | | This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693
OpenPOWER on IntegriCloud