summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* [NVPTX] Improve lowering of llvm.ctpop.Justin Lebar2017-01-181-5/+9
| | | | | | | | | | | | | | | | | | Summary: Avoid an unnecessary conversion operation when using the result of ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction we run returns an i32. (Previously if we used the value as an i32, we'd do an unnecessary zext+trunc.) Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28721 llvm-svn: 292302
* [NVPTX] Add lowering for llvm.bitreverse.Justin Lebar2017-01-182-0/+13
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28720 llvm-svn: 292301
* [NVPTX] Improve lowering of llvm.ctlz.Justin Lebar2017-01-182-9/+29
| | | | | | | | | | | | | | | | | Summary: * Disable "ctlz speculation", which inserts a branch on every ctlz(x) which has defined behavior on x == 0 to check whether x is, in fact zero. * Add DAG patterns that avoid re-truncating or re-expanding the result of the 16- and 64-bit ctz instructions. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28719 llvm-svn: 292299
* [NVPTX] Let there be One True Way to set NVVMReflect params.Justin Lebar2017-01-152-54/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously there were three ways to inform the NVVMReflect pass whether you wanted to flush denormals to zero: * An LLVM command-line option * Parameters to the NVVMReflect constructor * Metadata on the module itself. This change removes the first two, leaving only the third. The motivation for this change, aside from simplifying things, is that we want LLVM to be aware of whether it's operating in FTZ mode, so other passes can use this information. Ideally we'd have a target-generic piece of metadata on the module. This change moves us in that direction. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28700 llvm-svn: 292068
* [NVPTX] Added support for half-precision floating point.Artem Belevich2017-01-1316-98/+449
| | | | | | | | | | | | | | | | Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956
* [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is ↵Artem Belevich2017-01-135-13/+31
| | | | | | | | | | | | | | allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-1/+1
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-112-2/+3
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* [NVPTX] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-098-167/+227
| | | | | | other minor fixes (NFC). llvm-svn: 291490
* [NVVMIntrRange] Only set range metadata if none is already presentDavid Majnemer2016-12-221-0/+4
| | | | | | | The range metadata inserted by NVVMIntrRange is pessimistic, range metadata already present could be more precise. llvm-svn: 290294
* [NVPTX] Remove dead #defines from NVPTXUtilities.h.Justin Lebar2016-12-151-3/+0
| | | | llvm-svn: 289747
* [NVPTX] Remove dead code.Justin Lebar2016-12-145-130/+0
| | | | | | | | | | | I've chosen to remove NVPTXInstrInfo::CanTailMerge but not NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead) because while the latter two are reasonably useful utilities, the former cannot be used safely: It relies on successful address space inference to identify writes to shared memory, but addrspace inference is a best-effort thing. llvm-svn: 289740
* [NVPTX] Support .maxnreg annotation.Justin Lebar2016-12-143-0/+9
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729
* [NVPTX] Remove string constants from NVPTXBaseInfo.h.Justin Lebar2016-12-143-165/+88
| | | | | | | | | | | | | | | | | | Summary: Previously they were defined as a 2D char array in a header file. This is kind of overkill -- we can let the linker lay out these strings however it pleases. While we're at it, we might as well just inline these constants where they're used, as each of them is used only once. Also move NVPTXUtilities.{h,cpp} into namespace llvm. Reviewers: tra Subscribers: jholewinski, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D27636 llvm-svn: 289728
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-143-6/+6
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* Fix Clang-tidy readability-redundant-string-cstr warningsMalcolm Parsons2016-11-021-2/+1
| | | | | | | | | | Reviewers: beanz, lattner, jlebar Subscribers: jholewinski, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D26235 llvm-svn: 285832
* [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.Justin Lebar2016-10-316-314/+8
| | | | | | | | | | | | | | | Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642
* [NVPTX] Compute 'rem' using the result of 'div', if possible.Justin Lebar2016-10-281-0/+36
| | | | | | | | | | | | | | | | | | | | | Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461
* Target: Change various section classifiers in TargetLoweringObjectFile to ↵Peter Collingbourne2016-10-242-3/+3
| | | | | | | | | | | | | | | | take a GlobalObject. These functions are about classifying a global which will actually be emitted, so it does not make sense for them to take a GlobalValue which may for example be an alias. Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to look through aliases before using TargetLoweringObjectFile interfaces. These are functional changes but all appear to be bug fixes. Differential Revision: https://reviews.llvm.org/D25917 llvm-svn: 285006
* Remove unused #includes of TimeValue.h. NFC.Pavel Labath2016-10-241-1/+0
| | | | llvm-svn: 284975
* Do a sweep over move ctors and remove those that are identical to the default.Benjamin Kramer2016-10-201-7/+0
| | | | | | | | | | All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721
* Move the global variables representing each Target behind accessor functionMehdi Amini2016-10-096-13/+19
| | | | | | | | This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
* Target: Remove unused patterns and transforms. NFC.Peter Collingbourne2016-10-071-15/+0
| | | | llvm-svn: 283515
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-018-8/+8
| | | | llvm-svn: 283004
* Update comment about initializing TLOF with a pointer at the previousEric Christopher2016-09-291-1/+3
| | | | | | line or the other commented out place. llvm-svn: 282673
* [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.Artem Belevich2016-09-287-16/+263
| | | | | | | | These are only available on sm_60+ GPUs. Differential Revision: https://reviews.llvm.org/D24943 llvm-svn: 282607
* [NVPTX] Check if callsite is defined when computing argument allignmentJacques Pienaar2016-09-212-13/+20
| | | | | | | | | | | | Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before. Reviewers: eliben, jpienaar Subscribers: jlebar, vchuravy, cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D9168 llvm-svn: 282045
* Actually remove the Mangler from the AsmPrinter and clean up the places it ↵Eric Christopher2016-09-161-2/+0
| | | | | | was "used" but not used. llvm-svn: 281749
* Move the Mangler from the AsmPrinter down to TLOF and clean up theEric Christopher2016-09-162-6/+2
| | | | | | TLOF API accordingly. llvm-svn: 281708
* Finish renaming remaining analyzeBranch functionsMatt Arsenault2016-09-142-3/+3
| | | | llvm-svn: 281535
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-142-4/+4
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* AArch64: Use TTI branch functions in branch relaxationMatt Arsenault2016-09-142-4/+11
| | | | | | | | | The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505
* getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCISanjay Patel2016-09-142-9/+8
| | | | llvm-svn: 281493
* [NVPTX] Use ldg for explicitly invariant loads.Justin Lebar2016-09-111-13/+22
| | | | | | | | | | | | | | | | | | Summary: With this change (plus some changes to prevent !invariant from being clobbered within llvm), clang will be able to model the __ldg CUDA builtin as an invariant load, rather than as a target-specific llvm intrinsic. This will let the optimizer play with these loads -- specifically, we should be able to vectorize them in the load-store vectorizer. Reviewers: tra Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc Differential Revision: https://reviews.llvm.org/D23477 llvm-svn: 281152
* [CodeGen] Split out the notions of MI invariance and MI dereferenceability.Justin Lebar2016-09-111-3/+6
| | | | | | | | | | | | | | | | | | | Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151
* [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.Justin Lebar2016-09-092-16/+132
| | | | | | | | | | | | | | | | | | | | Summary: Previously these only worked via NVPTX-specific intrinsics. This change will allow us to convert these target-specific intrinsics into the general LLVM versions, allowing existing LLVM passes to reason about their behavior. It also gets us some minor codegen improvements as-is, from situations where we canonicalize code into one of these llvm intrinsics. Reviewers: majnemer Subscribers: llvm-commits, jholewinski, tra Differential Revision: https://reviews.llvm.org/D24300 llvm-svn: 281092
* CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePassesMatthias Braun2016-08-244-6/+0
| | | | | | | | | | | | | | | | | | | | | | Re-apply this patch, hopefully I will get away without any warnings in the constructor now. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279602
* Revert r279564. It introduces undefined behavior (binding a reference to aRichard Smith2016-08-234-0/+6
| | | | | | | dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes -Werror builds (including several buildbots) to fail. llvm-svn: 279580
* CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePassesMatthias Braun2016-08-234-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Re-apply this commit with the deletion of a MachineFunction delegated to a separate pass to avoid use after free when doing this directly in AsmPrinter. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279564
* Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove ↵Matthias Braun2016-08-234-0/+6
| | | | | | | | | | MachineFunctionAnalysis => Enable (Machine)ModulePasses" Reverting while tracking down a use after free. This reverts commit r279502. llvm-svn: 279503
* CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePassesMatthias Braun2016-08-234-6/+0
| | | | | | | | | | | | | | | | | | | This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279502
* [NVPTX] Switch nvptx-use-infer-addrspace to true.Justin Lebar2016-08-192-4/+1
| | | | | | | | | | | | | | | Summary: This switches us to use a different, more powerful algorithm for address space inference. I've tested this locally and it seems to work great. Once we're more confident in it, we can remove the old pass altogether. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D23694 llvm-svn: 279317
* [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> froundMichael Kuperstein2016-08-182-7/+7
| | | | | | | | | | The names of the tablegen defs now match the names of the ISD nodes. This makes the world a slightly saner place, as previously "fround" matched ISD::FP_ROUND and not ISD::FROUND. Differential Revision: https://reviews.llvm.org/D23597 llvm-svn: 279129
* Replace a few more "fall through" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970
* [NVPTX] Use untyped (.b) integer registers in PTX.Artem Belevich2016-08-121-3/+21
| | | | | | | | | | | | This bring LLVM-generated PTX closer to what nvcc generates and avoids triggering issues in ptxas. For instance, ptxas does not accept .s16 (or .u16) registers as operands for .fp16 instructions. Differential Revision: https://reviews.llvm.org/D23460 llvm-svn: 278568
* Use the range variant of find instead of unpacking begin/endDavid Majnemer2016-08-111-4/+4
| | | | | | | | | If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433
* [NVPTX] remove unnecessary named metadata update that happens to break debug ↵Artem Belevich2016-08-021-36/+0
| | | | | | | | | | info. Also added test case to verify IR changes done by NVPTXGenericToNVVM pass. Differential Revision: https://reviews.llvm.org/D22837 llvm-svn: 277520
* [ConstnatFolding] Teach the folder how to fold ConstantVectorDavid Majnemer2016-07-291-7/+7
| | | | | | | | | | | A ConstantVector can have ConstantExpr operands and vice versa. However, the folder had no ability to fold ConstantVectors which, in some cases, was an optimization barrier. Instead, rephrase the folder in terms of Constants instead of ConstantExprs and teach callers how to deal with failure. llvm-svn: 277099
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-284-31/+31
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* [NVPTX] Enable the load-store vectorizer on nvptx.Justin Lebar2016-07-202-1/+11
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196
OpenPOWER on IntegriCloud