summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
* Make getParamAlignment use argument numbersReid Kleckner2017-04-282-3/+3
| | | | | | | | | | | | | | | | | | The method is called "get *Param* Alignment", and is only used for return values exactly once, so it should take argument indices, not attribute indices. Avoids confusing code like: IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError); Alignment = CS->getParamAlignment(ArgIdx + 1); Add getRetAlignment to handle the one case in Value.cpp that wants the return value alignment. This is a potentially breaking change for out-of-tree backends that do their own call lowering. llvm-svn: 301682
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-1/+1
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* [APInt] Use lshrInPlace to replace lshr where possibleCraig Topper2017-04-181-1/+1
| | | | | | | | | | This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566
* Distinguish between code pointer size and DataLayout::getPointerSize() in ↵Konstantin Zhuravlyov2017-04-171-1/+1
| | | | | | DWARF info generation llvm-svn: 300463
* [IR] Make getParamAttributes take argument numbers, not ArgNo+1Reid Kleckner2017-04-132-2/+2
| | | | | | | | | | | | Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1, Kind) everywhere. The fact that the AttributeList index for an argument is ArgNo+1 should be a hidden implementation detail. NFC llvm-svn: 300272
* Fix spelling compliment->complement. Mostly refering to 2s complement. NFCCraig Topper2017-04-111-1/+1
| | | | llvm-svn: 299970
* Allow DataLayout to specify addrspace for allocas.Matt Arsenault2017-04-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | LLVM makes several assumptions about address space 0. However, alloca is presently constrained to always return this address space. There's no real way to avoid using alloca, so without this there is no way to opt out of these assumptions. The problematic assumptions include: - That the pointer size used for the stack is the same size as the code size pointer, which is also the maximum sized pointer. - That 0 is an invalid, non-dereferencable pointer value. These are problems for AMDGPU because alloca is used to implement the private address space, which uses a 32-bit index as the pointer value. Other pointers are 64-bit and behave more like LLVM's notion of generic address space. By changing the address space used for allocas, we can change our generic pointer type to be LLVM's generic pointer type which does have similar properties. llvm-svn: 299888
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-301-1/+1
| | | | | | Based on corrections mentioned in patch for clang for PR27635 llvm-svn: 299072
* Rename AttributeSet to AttributeListReid Kleckner2017-03-212-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
* [NVPTX] Remove unnecessary isImageReadoOnly(), isImageWriteOnly(), & ↵Justin Lebar2017-03-081-3/+1
| | | | | | | | | | | | isImageReadWrite calls This is repetition of isImage() function in NVPTXUtilities.cpp. Patch by Briana Grace! Differential Revision: https://reviews.llvm.org/D30706 llvm-svn: 297252
* [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.Artem Belevich2017-03-071-11/+31
| | | | | | Differential Revision: https://reviews.llvm.org/D30672 llvm-svn: 297198
* [NVPTX] Reduce amount of boilerplate code used to select load instruction ↵Artem Belevich2017-03-021-1781/+587
| | | | | | | | | | | | | | opcode. Make opcode selection code for the load instruction a bit easier to read and maintain. This patch also catches number of f16 load/store variants that were not handled before. Differential Revision: https://reviews.llvm.org/D30513 llvm-svn: 296785
* [NVPTX] Added missing LDU/LDG intrinsics for f16.Artem Belevich2017-03-022-2/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D30512 llvm-svn: 296784
* [NVPTX] Added support for .f16x2 instructions.Artem Belevich2017-02-2311-117/+714
| | | | | | | | | | | | | This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032
* Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' ↵Simon Pilgrim2017-02-221-4/+0
| | | | | | checking llvm-svn: 295830
* [NVPTX] Unify vectorization of load/stores of aggregate arguments and return ↵Artem Belevich2017-02-211-710/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784
* NVPTX: Extract mem intrinsic expansions into utilitiesMatt Arsenault2017-02-081-213/+11
| | | | llvm-svn: 294490
* [NVPTX] Enable combineRepeatedFPDivisors for NVPTX.Justin Lebar2017-02-031-0/+2
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D29477 llvm-svn: 294011
* NVPTX: Fix not preserving volatile when expanding memsetMatt Arsenault2017-02-021-3/+4
| | | | llvm-svn: 293851
* [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).Justin Lebar2017-01-311-3/+8
| | | | | | | | | | x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as desired. Verified that the particular nvptx approximate instructions here do in fact return 0 for x = 0. llvm-svn: 293713
* [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.Justin Lebar2017-01-313-10/+49
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This lets us lower to sqrt.approx and rsqrt.approx under more circumstances. * Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32, when fast-math is enabled. Previously, we only would emit it for calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.) * Now we emit the ftz version of rsqrt.approx when ftz is enabled. Previously, we only emitted rsqrt.approx when ftz was disabled. Reviewers: hfinkel Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D28508 llvm-svn: 293605
* NVPTX: Move InferAddressSpaces to generic codeMatt Arsenault2017-01-314-614/+1
| | | | llvm-svn: 293579
* NVPTX: Trivial cleanups of NVPTXInferAddressSpacesMatt Arsenault2017-01-301-28/+30
| | | | | | | | | - Move DEBUG_TYPE below includes - Change unknown address space constant to be consistent with other passes - Grammar fixes in debug output llvm-svn: 293567
* NVPTX: Refactor NVPTXInferAddressSpaces to check TTIMatt Arsenault2017-01-302-33/+57
| | | | | | Add a new TTI hook for getting the generic address space value. llvm-svn: 293563
* Only print architecture dependent flags for that architecture.Rafael Espindola2017-01-301-1/+1
| | | | | | | | | | Different architectures can have different meaning for flags in the SHF_MASKPROC mask, so we should always check what the architecture use before checking the flag. NFC for now, but will allow fixing the value of an xmos flag. llvm-svn: 293484
* [NVPTX] Add intrinsics to support named barriers.Arpith Chacko Jacob2017-01-281-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Support for barrier synchronization between a subset of threads in a CTA through one of sixteen explicitly specified barriers. These intrinsics are not directly exposed in CUDA but are critical for forthcoming support of OpenMP on NVPTX GPUs. The intrinsics allow the synchronization of an arbitrary (multiple of 32) number of threads in a CTA at one of 16 distinct barriers. The two intrinsics added are as follows: call void @llvm.nvvm.barrier.n(i32 10) waits for all threads in a CTA to arrive at named barrier #10. call void @llvm.nvvm.barrier(i32 15, i32 992) waits for 992 threads in a CTA to arrive at barrier #15. Detailed description of these intrinsics are available in the PTX manual. http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions Reviewers: hfinkel, jlebar Differential Revision: https://reviews.llvm.org/D17657 llvm-svn: 293384
* NVPTX: Make NVPTXInferAddressSpaces preserve CFGMatt Arsenault2017-01-271-0/+4
| | | | llvm-svn: 293308
* NVPTXCodeGen: Add IPO to libdeps, since r293189.NAKAMURA Takumi2017-01-271-1/+1
| | | | llvm-svn: 293256
* Replace addEarlyAsPossiblePasses callback with adjustPassManagerStanislav Mekhanoshin2017-01-262-4/+10
| | | | | | | | | | | | | | This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
* [NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code.Justin Lebar2017-01-211-47/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Specifically, we upgrade llvm.nvvm.: * brev{32,64} * clz.{i,ll} * popc.{i,ll} * abs.{i,ll} * {min,max}.{i,ll,u,ull} * h2f These either map directly to an existing LLVM target-generic intrinsic or map to a simple LLVM target-generic idiom. In all cases, we check that the code we generate is lowered to PTX as we expect. These builtins don't need to be backfilled in clang: They're not accessible to user code from nvcc. Reviewers: tra Subscribers: majnemer, cfe-commits, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28793 llvm-svn: 292694
* [NVPTX] Move getDivF32Level, usePrecSqrtF32, and useF32FTZ into out of ↵Justin Lebar2017-01-213-46/+75
| | | | | | | | | | | | | | | | | | DAGToDAG and into TargetLowering. Summary: DADToDAG has access to TargetLowering, but not vice versa, so this is the more general location for these functions. NFC Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28795 llvm-svn: 292693
* [NVPTX] Fix lowering of fp16 ISD::FNEG.Artem Belevich2017-01-191-0/+2
| | | | | | | | | There's no neg.f16 instruction, so negation has to be done via subtraction from zero. Differential Revision: https://reviews.llvm.org/D28876 llvm-svn: 292452
* [NVPTX] Support global variables of integer type larger than i64.Justin Lebar2017-01-181-1/+14
| | | | | | | | | | Reviewers: tra, majnemer Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28825 llvm-svn: 292316
* [NVPTX] Standardize asm printer on "foo \tbar".Justin Lebar2017-01-182-218/+218
| | | | | | | Some instructions were printed as "foo\tbar", but most are printed as "foo \bar". Standardize on the latter form. llvm-svn: 292306
* [NVPTX] Clean up nested !strconcat calls.Justin Lebar2017-01-182-99/+68
| | | | | | | !strconcat is a variadic function; it will concatenate an arbitrary number of strings. There's no need to nest it. llvm-svn: 292305
* [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.Justin Lebar2017-01-182-70/+16
| | | | | | | | | | | | | | | | | | | | | | | Summary: This change also lets us use max.{s,u}16. There's a vague warning in a test about this maybe being less efficient, but I could not come up with a case where the resulting SASS (sm_35 or sm_60) was different with or without max.{s,u}16. It's true that nvcc seems to emit only max.{s,u}32, but even ptxas 7.0 seems to have no problem generating efficient SASS from max.{s,u}16 (the casts up to i32 and back down to i16 seem to be implicit and nops, happening via register aliasing). In the absence of evidence, better to have fewer special cases, emit more straightforward code, etc. In particular, if a new GPU has 16-bit min/max instructions, we want to be able to use them. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28732 llvm-svn: 292304
* [NVPTX] Lower integer absolute value idiom to abs instruction.Justin Lebar2017-01-181-0/+12
| | | | | | | | | | | | Summary: Previously we lowered it literally, to shifts and xors. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28722 llvm-svn: 292303
* [NVPTX] Improve lowering of llvm.ctpop.Justin Lebar2017-01-181-5/+9
| | | | | | | | | | | | | | | | | | Summary: Avoid an unnecessary conversion operation when using the result of ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction we run returns an i32. (Previously if we used the value as an i32, we'd do an unnecessary zext+trunc.) Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28721 llvm-svn: 292302
* [NVPTX] Add lowering for llvm.bitreverse.Justin Lebar2017-01-182-0/+13
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28720 llvm-svn: 292301
* [NVPTX] Improve lowering of llvm.ctlz.Justin Lebar2017-01-182-9/+29
| | | | | | | | | | | | | | | | | Summary: * Disable "ctlz speculation", which inserts a branch on every ctlz(x) which has defined behavior on x == 0 to check whether x is, in fact zero. * Add DAG patterns that avoid re-truncating or re-expanding the result of the 16- and 64-bit ctz instructions. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28719 llvm-svn: 292299
* [NVPTX] Let there be One True Way to set NVVMReflect params.Justin Lebar2017-01-152-54/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously there were three ways to inform the NVVMReflect pass whether you wanted to flush denormals to zero: * An LLVM command-line option * Parameters to the NVVMReflect constructor * Metadata on the module itself. This change removes the first two, leaving only the third. The motivation for this change, aside from simplifying things, is that we want LLVM to be aware of whether it's operating in FTZ mode, so other passes can use this information. Ideally we'd have a target-generic piece of metadata on the module. This change moves us in that direction. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28700 llvm-svn: 292068
* [NVPTX] Added support for half-precision floating point.Artem Belevich2017-01-1316-98/+449
| | | | | | | | | | | | | | | | Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956
* [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is ↵Artem Belevich2017-01-135-13/+31
| | | | | | | | | | | | | | allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-1/+1
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-112-2/+3
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* [NVPTX] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-098-167/+227
| | | | | | other minor fixes (NFC). llvm-svn: 291490
* [NVVMIntrRange] Only set range metadata if none is already presentDavid Majnemer2016-12-221-0/+4
| | | | | | | The range metadata inserted by NVVMIntrRange is pessimistic, range metadata already present could be more precise. llvm-svn: 290294
* [NVPTX] Remove dead #defines from NVPTXUtilities.h.Justin Lebar2016-12-151-3/+0
| | | | llvm-svn: 289747
* [NVPTX] Remove dead code.Justin Lebar2016-12-145-130/+0
| | | | | | | | | | | I've chosen to remove NVPTXInstrInfo::CanTailMerge but not NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead) because while the latter two are reasonably useful utilities, the former cannot be used safely: It relies on successful address space inference to identify writes to shared memory, but addrspace inference is a best-effort thing. llvm-svn: 289740
* [NVPTX] Support .maxnreg annotation.Justin Lebar2016-12-143-0/+9
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729
OpenPOWER on IntegriCloud