summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).Justin Lebar2017-01-311-3/+8
| | | | | | | | | | x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as desired. Verified that the particular nvptx approximate instructions here do in fact return 0 for x = 0. llvm-svn: 293713
* [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.Justin Lebar2017-01-313-10/+49
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This lets us lower to sqrt.approx and rsqrt.approx under more circumstances. * Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32, when fast-math is enabled. Previously, we only would emit it for calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.) * Now we emit the ftz version of rsqrt.approx when ftz is enabled. Previously, we only emitted rsqrt.approx when ftz was disabled. Reviewers: hfinkel Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D28508 llvm-svn: 293605
* NVPTX: Move InferAddressSpaces to generic codeMatt Arsenault2017-01-314-614/+1
| | | | llvm-svn: 293579
* NVPTX: Trivial cleanups of NVPTXInferAddressSpacesMatt Arsenault2017-01-301-28/+30
| | | | | | | | | - Move DEBUG_TYPE below includes - Change unknown address space constant to be consistent with other passes - Grammar fixes in debug output llvm-svn: 293567
* NVPTX: Refactor NVPTXInferAddressSpaces to check TTIMatt Arsenault2017-01-302-33/+57
| | | | | | Add a new TTI hook for getting the generic address space value. llvm-svn: 293563
* Only print architecture dependent flags for that architecture.Rafael Espindola2017-01-301-1/+1
| | | | | | | | | | Different architectures can have different meaning for flags in the SHF_MASKPROC mask, so we should always check what the architecture use before checking the flag. NFC for now, but will allow fixing the value of an xmos flag. llvm-svn: 293484
* [NVPTX] Add intrinsics to support named barriers.Arpith Chacko Jacob2017-01-281-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Support for barrier synchronization between a subset of threads in a CTA through one of sixteen explicitly specified barriers. These intrinsics are not directly exposed in CUDA but are critical for forthcoming support of OpenMP on NVPTX GPUs. The intrinsics allow the synchronization of an arbitrary (multiple of 32) number of threads in a CTA at one of 16 distinct barriers. The two intrinsics added are as follows: call void @llvm.nvvm.barrier.n(i32 10) waits for all threads in a CTA to arrive at named barrier #10. call void @llvm.nvvm.barrier(i32 15, i32 992) waits for 992 threads in a CTA to arrive at barrier #15. Detailed description of these intrinsics are available in the PTX manual. http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions Reviewers: hfinkel, jlebar Differential Revision: https://reviews.llvm.org/D17657 llvm-svn: 293384
* NVPTX: Make NVPTXInferAddressSpaces preserve CFGMatt Arsenault2017-01-271-0/+4
| | | | llvm-svn: 293308
* NVPTXCodeGen: Add IPO to libdeps, since r293189.NAKAMURA Takumi2017-01-271-1/+1
| | | | llvm-svn: 293256
* Replace addEarlyAsPossiblePasses callback with adjustPassManagerStanislav Mekhanoshin2017-01-262-4/+10
| | | | | | | | | | | | | | This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
* [NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code.Justin Lebar2017-01-211-47/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Specifically, we upgrade llvm.nvvm.: * brev{32,64} * clz.{i,ll} * popc.{i,ll} * abs.{i,ll} * {min,max}.{i,ll,u,ull} * h2f These either map directly to an existing LLVM target-generic intrinsic or map to a simple LLVM target-generic idiom. In all cases, we check that the code we generate is lowered to PTX as we expect. These builtins don't need to be backfilled in clang: They're not accessible to user code from nvcc. Reviewers: tra Subscribers: majnemer, cfe-commits, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28793 llvm-svn: 292694
* [NVPTX] Move getDivF32Level, usePrecSqrtF32, and useF32FTZ into out of ↵Justin Lebar2017-01-213-46/+75
| | | | | | | | | | | | | | | | | | DAGToDAG and into TargetLowering. Summary: DADToDAG has access to TargetLowering, but not vice versa, so this is the more general location for these functions. NFC Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28795 llvm-svn: 292693
* [NVPTX] Fix lowering of fp16 ISD::FNEG.Artem Belevich2017-01-191-0/+2
| | | | | | | | | There's no neg.f16 instruction, so negation has to be done via subtraction from zero. Differential Revision: https://reviews.llvm.org/D28876 llvm-svn: 292452
* [NVPTX] Support global variables of integer type larger than i64.Justin Lebar2017-01-181-1/+14
| | | | | | | | | | Reviewers: tra, majnemer Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28825 llvm-svn: 292316
* [NVPTX] Standardize asm printer on "foo \tbar".Justin Lebar2017-01-182-218/+218
| | | | | | | Some instructions were printed as "foo\tbar", but most are printed as "foo \bar". Standardize on the latter form. llvm-svn: 292306
* [NVPTX] Clean up nested !strconcat calls.Justin Lebar2017-01-182-99/+68
| | | | | | | !strconcat is a variadic function; it will concatenate an arbitrary number of strings. There's no need to nest it. llvm-svn: 292305
* [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.Justin Lebar2017-01-182-70/+16
| | | | | | | | | | | | | | | | | | | | | | | Summary: This change also lets us use max.{s,u}16. There's a vague warning in a test about this maybe being less efficient, but I could not come up with a case where the resulting SASS (sm_35 or sm_60) was different with or without max.{s,u}16. It's true that nvcc seems to emit only max.{s,u}32, but even ptxas 7.0 seems to have no problem generating efficient SASS from max.{s,u}16 (the casts up to i32 and back down to i16 seem to be implicit and nops, happening via register aliasing). In the absence of evidence, better to have fewer special cases, emit more straightforward code, etc. In particular, if a new GPU has 16-bit min/max instructions, we want to be able to use them. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28732 llvm-svn: 292304
* [NVPTX] Lower integer absolute value idiom to abs instruction.Justin Lebar2017-01-181-0/+12
| | | | | | | | | | | | Summary: Previously we lowered it literally, to shifts and xors. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28722 llvm-svn: 292303
* [NVPTX] Improve lowering of llvm.ctpop.Justin Lebar2017-01-181-5/+9
| | | | | | | | | | | | | | | | | | Summary: Avoid an unnecessary conversion operation when using the result of ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction we run returns an i32. (Previously if we used the value as an i32, we'd do an unnecessary zext+trunc.) Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28721 llvm-svn: 292302
* [NVPTX] Add lowering for llvm.bitreverse.Justin Lebar2017-01-182-0/+13
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28720 llvm-svn: 292301
* [NVPTX] Improve lowering of llvm.ctlz.Justin Lebar2017-01-182-9/+29
| | | | | | | | | | | | | | | | | Summary: * Disable "ctlz speculation", which inserts a branch on every ctlz(x) which has defined behavior on x == 0 to check whether x is, in fact zero. * Add DAG patterns that avoid re-truncating or re-expanding the result of the 16- and 64-bit ctz instructions. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28719 llvm-svn: 292299
* [NVPTX] Let there be One True Way to set NVVMReflect params.Justin Lebar2017-01-152-54/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously there were three ways to inform the NVVMReflect pass whether you wanted to flush denormals to zero: * An LLVM command-line option * Parameters to the NVVMReflect constructor * Metadata on the module itself. This change removes the first two, leaving only the third. The motivation for this change, aside from simplifying things, is that we want LLVM to be aware of whether it's operating in FTZ mode, so other passes can use this information. Ideally we'd have a target-generic piece of metadata on the module. This change moves us in that direction. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28700 llvm-svn: 292068
* [NVPTX] Added support for half-precision floating point.Artem Belevich2017-01-1316-98/+449
| | | | | | | | | | | | | | | | Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956
* [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is ↵Artem Belevich2017-01-135-13/+31
| | | | | | | | | | | | | | allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-1/+1
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-112-2/+3
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* [NVPTX] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-098-167/+227
| | | | | | other minor fixes (NFC). llvm-svn: 291490
* [NVVMIntrRange] Only set range metadata if none is already presentDavid Majnemer2016-12-221-0/+4
| | | | | | | The range metadata inserted by NVVMIntrRange is pessimistic, range metadata already present could be more precise. llvm-svn: 290294
* [NVPTX] Remove dead #defines from NVPTXUtilities.h.Justin Lebar2016-12-151-3/+0
| | | | llvm-svn: 289747
* [NVPTX] Remove dead code.Justin Lebar2016-12-145-130/+0
| | | | | | | | | | | I've chosen to remove NVPTXInstrInfo::CanTailMerge but not NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead) because while the latter two are reasonably useful utilities, the former cannot be used safely: It relies on successful address space inference to identify writes to shared memory, but addrspace inference is a best-effort thing. llvm-svn: 289740
* [NVPTX] Support .maxnreg annotation.Justin Lebar2016-12-143-0/+9
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729
* [NVPTX] Remove string constants from NVPTXBaseInfo.h.Justin Lebar2016-12-143-165/+88
| | | | | | | | | | | | | | | | | | Summary: Previously they were defined as a 2D char array in a header file. This is kind of overkill -- we can let the linker lay out these strings however it pleases. While we're at it, we might as well just inline these constants where they're used, as each of them is used only once. Also move NVPTXUtilities.{h,cpp} into namespace llvm. Reviewers: tra Subscribers: jholewinski, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D27636 llvm-svn: 289728
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-143-6/+6
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* Fix Clang-tidy readability-redundant-string-cstr warningsMalcolm Parsons2016-11-021-2/+1
| | | | | | | | | | Reviewers: beanz, lattner, jlebar Subscribers: jholewinski, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D26235 llvm-svn: 285832
* [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.Justin Lebar2016-10-316-314/+8
| | | | | | | | | | | | | | | Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642
* [NVPTX] Compute 'rem' using the result of 'div', if possible.Justin Lebar2016-10-281-0/+36
| | | | | | | | | | | | | | | | | | | | | Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461
* Target: Change various section classifiers in TargetLoweringObjectFile to ↵Peter Collingbourne2016-10-242-3/+3
| | | | | | | | | | | | | | | | take a GlobalObject. These functions are about classifying a global which will actually be emitted, so it does not make sense for them to take a GlobalValue which may for example be an alias. Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to look through aliases before using TargetLoweringObjectFile interfaces. These are functional changes but all appear to be bug fixes. Differential Revision: https://reviews.llvm.org/D25917 llvm-svn: 285006
* Remove unused #includes of TimeValue.h. NFC.Pavel Labath2016-10-241-1/+0
| | | | llvm-svn: 284975
* Do a sweep over move ctors and remove those that are identical to the default.Benjamin Kramer2016-10-201-7/+0
| | | | | | | | | | All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721
* Move the global variables representing each Target behind accessor functionMehdi Amini2016-10-096-13/+19
| | | | | | | | This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
* Target: Remove unused patterns and transforms. NFC.Peter Collingbourne2016-10-071-15/+0
| | | | llvm-svn: 283515
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-018-8/+8
| | | | llvm-svn: 283004
* Update comment about initializing TLOF with a pointer at the previousEric Christopher2016-09-291-1/+3
| | | | | | line or the other commented out place. llvm-svn: 282673
* [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.Artem Belevich2016-09-287-16/+263
| | | | | | | | These are only available on sm_60+ GPUs. Differential Revision: https://reviews.llvm.org/D24943 llvm-svn: 282607
* [NVPTX] Check if callsite is defined when computing argument allignmentJacques Pienaar2016-09-212-13/+20
| | | | | | | | | | | | Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before. Reviewers: eliben, jpienaar Subscribers: jlebar, vchuravy, cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D9168 llvm-svn: 282045
* Actually remove the Mangler from the AsmPrinter and clean up the places it ↵Eric Christopher2016-09-161-2/+0
| | | | | | was "used" but not used. llvm-svn: 281749
* Move the Mangler from the AsmPrinter down to TLOF and clean up theEric Christopher2016-09-162-6/+2
| | | | | | TLOF API accordingly. llvm-svn: 281708
* Finish renaming remaining analyzeBranch functionsMatt Arsenault2016-09-142-3/+3
| | | | llvm-svn: 281535
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-142-4/+4
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* AArch64: Use TTI branch functions in branch relaxationMatt Arsenault2016-09-142-4/+11
| | | | | | | | | The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505
OpenPOWER on IntegriCloud