summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* [NVPTX] allow address space inference for volatile loads/stores.Artem Belevich2017-10-241-0/+16
| | | | | | | | | | If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495
* Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"Matthias Braun2017-10-122-3/+3
| | | | | | | | | | Reverting to investigate layering effects of MCJIT not linking libCodeGen but using TargetMachine::getNameWithPrefix() breaking the lldb bots. This reverts commit r315633. llvm-svn: 315637
* TargetMachine: Merge TargetMachine and LLVMTargetMachineMatthias Braun2017-10-122-3/+3
| | | | | | | | | | | | | | | Merge LLVMTargetMachine into TargetMachine. - There is no in-tree target anymore that just implements TargetMachine but not LLVMTargetMachine. - It should still be possible to stub out all the various functions in case a target does not want to use lib/CodeGen - This simplifies the code and avoids methods ending up in the wrong interface. Differential Revision: https://reviews.llvm.org/D38489 llvm-svn: 315633
* [NVPTX] Implemented wmma intrinsics and instructions.Artem Belevich2017-10-124-0/+845
| | | | | | | | | | WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 llvm-svn: 315601
* Remove dead file.Michael Liao2017-10-021-1479/+0
| | | | llvm-svn: 314720
* [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.Artem Belevich2017-09-264-0/+93
| | | | | | Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223
* Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & ↵Justin Lebar2017-09-254-92/+0
| | | | | | | | | | | | | | | builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142
* [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.Artem Belevich2017-09-254-0/+92
| | | | | | Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135
* [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} ↵Artem Belevich2017-09-212-0/+66
| | | | | | | | instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898
* [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.Artem Belevich2017-09-201-1/+73
| | | | | | Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820
* [CUDA] Added rudimentary support for CUDA-9 and sm_70.Artem Belevich2017-09-071-0/+5
| | | | | | | | | | | | | For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734
* [NVPTX] Reduce copypasta.Benjamin Kramer2017-08-201-702/+7
| | | | | | No functionality change intended. llvm-svn: 311295
* Delete Default and JITDefault code modelsRafael Espindola2017-08-032-13/+18
| | | | | | | | | | | | | | | IMHO it is an antipattern to have a enum value that is Default. At any given piece of code it is not clear if we have to handle Default or if has already been mapped to a concrete value. In this case in particular, only the target can do the mapping and it is nice to make sure it is always done. This deletes the two default enum values of CodeModel and uses an explicit Optional<CodeModel> when it is possible that it is unspecified. llvm-svn: 309911
* Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. ↵Peter Collingbourne2017-07-262-11/+10
| | | | | | | | NFCI. This was a use-after-free waiting to happen. llvm-svn: 309159
* [SystemZ, LoopStrengthReduce]Jonas Paulsson2017-07-212-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
* [NVPTX] Add lowering of i128 params.Artem Belevich2017-07-203-11/+32
| | | | | | | | | | | | | | | | | The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 308675
* Changed EOL back to LF. NFC.Artem Belevich2017-07-202-7832/+7832
| | | | llvm-svn: 308671
* Remove unneeded use of #undef DEBUG_TYPE. NFCSam Clegg2017-07-121-1/+0
| | | | | | | | | | | Where is is needed (at the end of headers that define it), be consistent about its use. Also fix a few header guards that I found in the process. Differential Revision: https://reviews.llvm.org/D34916 llvm-svn: 307840
* Extend memcpy expansion in Transform/Utils to handle wider operand types.Sean Fertile2017-07-071-10/+25
| | | | | | | | | | | Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
* Reverting r307326 because it breaks clang tests.Michael Kuperstein2017-07-063-32/+11
| | | | llvm-svn: 307334
* [NVPTX] Add lowering of i128 params.Michael Kuperstein2017-07-063-11/+32
| | | | | | | | | | | | | | | | | The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 307326
* fix trivial typos in comments; NFCHiroshi Inoue2017-07-041-1/+1
| | | | llvm-svn: 307094
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-282-3/+4
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-068-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* TargetMachine: Indicate whether machine verifier passes.Matthias Braun2017-05-311-0/+3
| | | | | | | | | | | | | This adds a callback to the LLVMTargetMachine that lets target indicate that they do not pass the machine verifier checks in all cases yet. This is intended to be a temporary measure while the targets are fixed allowing us to enable the machine verifier by default with EXPENSIVE_CHECKS enabled! Differential Revision: https://reviews.llvm.org/D33696 llvm-svn: 304320
* TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFCMatthias Braun2017-05-301-2/+2
| | | | | | | | | | | TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247
* [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ↵Simon Pilgrim2017-05-151-3/+3
| | | | | | | | | | | | | | ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082
* [NVPTX] Don't rely on default arguments to ↵Simon Pilgrim2017-05-151-3/+8
| | | | | | | | SelectionDAG::getMemIntrinsicNode. NFC. NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues. llvm-svn: 303047
* [NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)Simon Pilgrim2017-05-121-1/+3
| | | | | | | | This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33147 llvm-svn: 302942
* Add extra operand to CALLSEQ_START to keep frame part set up previouslySerge Pavlov2017-05-092-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527
* [NVPTX] Add support for ISD::ABS loweringSimon Pilgrim2017-05-062-7826/+7826
| | | | | | | | Use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32944 llvm-svn: 302356
* Make getParamAlignment use argument numbersReid Kleckner2017-04-282-3/+3
| | | | | | | | | | | | | | | | | | The method is called "get *Param* Alignment", and is only used for return values exactly once, so it should take argument indices, not attribute indices. Avoids confusing code like: IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError); Alignment = CS->getParamAlignment(ArgIdx + 1); Add getRetAlignment to handle the one case in Value.cpp that wants the return value alignment. This is a potentially breaking change for out-of-tree backends that do their own call lowering. llvm-svn: 301682
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-1/+1
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* [APInt] Use lshrInPlace to replace lshr where possibleCraig Topper2017-04-181-1/+1
| | | | | | | | | | This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566
* Distinguish between code pointer size and DataLayout::getPointerSize() in ↵Konstantin Zhuravlyov2017-04-171-1/+1
| | | | | | DWARF info generation llvm-svn: 300463
* [IR] Make getParamAttributes take argument numbers, not ArgNo+1Reid Kleckner2017-04-132-2/+2
| | | | | | | | | | | | Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1, Kind) everywhere. The fact that the AttributeList index for an argument is ArgNo+1 should be a hidden implementation detail. NFC llvm-svn: 300272
* Fix spelling compliment->complement. Mostly refering to 2s complement. NFCCraig Topper2017-04-111-1/+1
| | | | llvm-svn: 299970
* Allow DataLayout to specify addrspace for allocas.Matt Arsenault2017-04-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | LLVM makes several assumptions about address space 0. However, alloca is presently constrained to always return this address space. There's no real way to avoid using alloca, so without this there is no way to opt out of these assumptions. The problematic assumptions include: - That the pointer size used for the stack is the same size as the code size pointer, which is also the maximum sized pointer. - That 0 is an invalid, non-dereferencable pointer value. These are problems for AMDGPU because alloca is used to implement the private address space, which uses a 32-bit index as the pointer value. Other pointers are 64-bit and behave more like LLVM's notion of generic address space. By changing the address space used for allocas, we can change our generic pointer type to be LLVM's generic pointer type which does have similar properties. llvm-svn: 299888
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-301-1/+1
| | | | | | Based on corrections mentioned in patch for clang for PR27635 llvm-svn: 299072
* Rename AttributeSet to AttributeListReid Kleckner2017-03-212-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
* [NVPTX] Remove unnecessary isImageReadoOnly(), isImageWriteOnly(), & ↵Justin Lebar2017-03-081-3/+1
| | | | | | | | | | | | isImageReadWrite calls This is repetition of isImage() function in NVPTXUtilities.cpp. Patch by Briana Grace! Differential Revision: https://reviews.llvm.org/D30706 llvm-svn: 297252
* [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.Artem Belevich2017-03-071-11/+31
| | | | | | Differential Revision: https://reviews.llvm.org/D30672 llvm-svn: 297198
* [NVPTX] Reduce amount of boilerplate code used to select load instruction ↵Artem Belevich2017-03-021-1781/+587
| | | | | | | | | | | | | | opcode. Make opcode selection code for the load instruction a bit easier to read and maintain. This patch also catches number of f16 load/store variants that were not handled before. Differential Revision: https://reviews.llvm.org/D30513 llvm-svn: 296785
* [NVPTX] Added missing LDU/LDG intrinsics for f16.Artem Belevich2017-03-022-2/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D30512 llvm-svn: 296784
* [NVPTX] Added support for .f16x2 instructions.Artem Belevich2017-02-2311-117/+714
| | | | | | | | | | | | | This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032
* Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' ↵Simon Pilgrim2017-02-221-4/+0
| | | | | | checking llvm-svn: 295830
* [NVPTX] Unify vectorization of load/stores of aggregate arguments and return ↵Artem Belevich2017-02-211-710/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784
* NVPTX: Extract mem intrinsic expansions into utilitiesMatt Arsenault2017-02-081-213/+11
| | | | llvm-svn: 294490
* [NVPTX] Enable combineRepeatedFPDivisors for NVPTX.Justin Lebar2017-02-031-0/+2
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D29477 llvm-svn: 294011
* NVPTX: Fix not preserving volatile when expanding memsetMatt Arsenault2017-02-021-3/+4
| | | | llvm-svn: 293851
OpenPOWER on IntegriCloud