summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* Nuke the old JIT.Rafael Espindola2014-08-071-6/+0
| | | | | | | | | I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-054-17/+9
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-046-54/+51
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* Improve some const-correctness to remove a -Wcast-qual warning. No ↵Aaron Ballman2014-08-013-4/+5
| | | | | | functional changes intended. llvm-svn: 214503
* Make sure no loads resulting from load->switch DAGCombine are marked invariantLouis Gerbarg2014-07-311-1/+2
| | | | | | | | | | | | | | Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449
* Fixing a -Wcast-qual warning in GCC. No functional changes.Aaron Ballman2014-07-311-2/+2
| | | | llvm-svn: 214399
* [NVPTX] Silence a GCC warning found by the buildbotsJustin Holewinski2014-07-231-1/+1
| | | | | | | The cast to NVPTXTargetLowering was missing a 'const', but let's just access the right pointer through the subtarget anyway. llvm-svn: 213793
* [NVPTX] mul.wide generation works for any smaller integer source types, not ↵Justin Holewinski2014-07-231-2/+2
| | | | | | just the next smaller power of two llvm-svn: 213784
* [NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations ↵Justin Holewinski2014-07-231-2/+1
| | | | | | | | | | are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773
* NVPTX: support fpext/fptrunc to and from f16.Tim Northover2014-07-181-0/+3
| | | | llvm-svn: 213377
* NVPTX: support direct f16 <-> f64 conversions via intrinsics.Tim Northover2014-07-181-0/+5
| | | | | | | | Clang may well start emitting these soon, and while it may not be directly relevant for OpenCL or GLSL, the instructions were just sitting there waiting to be used. llvm-svn: 213356
* [NVPTX] Improve handling of FP fusionJustin Holewinski2014-07-175-48/+62
| | | | | | | | | We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. llvm-svn: 213287
* [NVPTX] Add missing .v4 qualifier on vector store instructionJustin Holewinski2014-07-171-1/+1
| | | | llvm-svn: 213276
* [NVPTX] Flag surface/texture query instructions with IsTexSurfQueryJustin Holewinski2014-07-171-0/+6
| | | | | | | Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. llvm-svn: 213268
* [NVPTX] Add more surface/texture intrinsics, including CUDA unified texture ↵Justin Holewinski2014-07-179-801/+6542
| | | | | | | | | | | fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions llvm-svn: 213256
* CodeGen: extend f16 conversions to permit types > float.Tim Northover2014-07-171-3/+3
| | | | | | | | | | | | | | | | | | | This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248
* [NVPTX] Honor alignment on vector loads/storesJustin Holewinski2014-07-161-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | We were not considering the stated alignment on vector loads/stores, leading us to generate vector instructions even when we do not have sufficient alignment. Now, for IR like: %1 = load <4 x float>, <4 x float>* %ptr, align 4 we will generate correct, conservative PTX like: ld.f32 ... [%ptr] ld.f32 ... [%ptr+4] ld.f32 ... [%ptr+8] ld.f32 ... [%ptr+12] Or if we have an alignment of 8 (for example), we can generate code like: ld.v2.f32 ... [%ptr] ld.v2.f32 ... [%ptr+8] llvm-svn: 213186
* [NVPTX] Rename registers %fl -> %fd and %rl -> %rdJustin Holewinski2014-07-164-8/+8
| | | | | | This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168
* CodeGen: Stick constant pool entries in COMDAT sections for WinCOFFDavid Majnemer2014-07-141-1/+2
| | | | | | | | | | | | | | | | COFF lacks a feature that other object file formats support: mergeable sections. To work around this, MSVC sticks constant pool entries in special COMDAT sections so that each constant is in it's own section. This permits unused constants to be dropped and it also allows duplicate constants in different translation units to get merged together. This fixes PR20262. Differential Revision: http://reviews.llvm.org/D4482 llvm-svn: 213006
* NVPTX/LLVMBuild.txt: Add "Scalar" to required_libraries. It is really ↵NAKAMURA Takumi2014-07-141-1/+1
| | | | | | referenced. llvm-svn: 212918
* [codegen,aarch64] Add a target hook to the code generator to controlChandler Carruth2014-07-032-3/+8
| | | | | | | | | | | | | | | | | | | | | vector type legalization strategies in a more fine grained manner, and change the legalization of several v1iN types and v1f32 to be widening rather than scalarization on AArch64. This fixes an assertion failure caused by scalarizing nodes like "v1i32 trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32. This also provides a foundation for other targets to have more granular control over how vector types are legalized. Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow some work to start taking place on top of this patch as it adds some really important hooks to the backend that I'd like to immediately start using. =] http://reviews.llvm.org/D4322 llvm-svn: 212242
* [NVPTX] Use GreatestCommonDivisor64 from MathExtras instead of using our ↵Justin Holewinski2014-06-271-14/+4
| | | | | | own. Thanks Hal! llvm-svn: 211952
* [NVPTX] Add reflect intrinsic (better than matching by function name)Justin Holewinski2014-06-271-22/+47
| | | | | | Also clean up some of the logic in NVVMReflect.cpp while we're messing around in there. llvm-svn: 211948
* [NVPTX] Handle all possible vector types in getSetCCResultType, not just the ↵Justin Holewinski2014-06-271-2/+2
| | | | | | ones representable as MVTs llvm-svn: 211947
* [NVPTX] Add 'b' asm constraintJustin Holewinski2014-06-271-0/+3
| | | | llvm-svn: 211946
* [NVPTX] Simplify some argument lowering logicJustin Holewinski2014-06-271-13/+8
| | | | llvm-svn: 211945
* [NVPTX] Do not process samplers in GenericToNVVMJustin Holewinski2014-06-271-1/+1
| | | | llvm-svn: 211944
* [NVPTX] Error out if initializer is given for variable in an address space ↵Justin Holewinski2014-06-271-7/+18
| | | | | | that does not support initialization llvm-svn: 211943
* [NVPTX] Add support for .managed variables for UVMJustin Holewinski2014-06-271-0/+5
| | | | llvm-svn: 211942
* [NVPTX] Emit .weak linkage for link_once, weak, available_externally, and ↵Justin Holewinski2014-06-271-0/+4
| | | | | | common linkage llvm-svn: 211941
* [NVPTX] Variables that start with llvm. or nvvm. are reserved and should not ↵Justin Holewinski2014-06-271-0/+5
| | | | | | be emitted llvm-svn: 211940
* [NVPTX] Fix handling of ldg/ldu intrinsics.Justin Holewinski2014-06-274-100/+375
| | | | | | | | | | The address space of the pointer must be global (1) for these intrinsics. There must also be alignment metadata attached to the intrinsic calls, e.g. %val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0 !0 = metadata !{i32 4} llvm-svn: 211939
* [NVPTX] Clean up argument lowering code and properly handle alignment for ↵Justin Holewinski2014-06-271-90/+76
| | | | | | structs and vectors llvm-svn: 211938
* [NVPTX] Add missing boolean vector contents flagJustin Holewinski2014-06-271-0/+1
| | | | llvm-svn: 211937
* [NVPTX] Add support for [SHL,SRA,SRL]_PARTSJustin Holewinski2014-06-273-0/+170
| | | | llvm-svn: 211936
* [NVPTX] Implement fma and imad contraction as target DAGCombiner patternsJustin Holewinski2014-06-274-126/+549
| | | | | | This also introduces DAGCombiner patterns for mul.wide to multiply two smaller integers and produce a larger integer llvm-svn: 211935
* [NVPTX] Add support for efficient rotate instructions on SM 3.2+Justin Holewinski2014-06-272-4/+170
| | | | llvm-svn: 211934
* [NVPTX] Add missing isel patterns for 64-bit atomicsJustin Holewinski2014-06-271-0/+98
| | | | llvm-svn: 211933
* [NVPTX] Add isel patterns for bit-field extract (bfe)Justin Holewinski2014-06-273-0/+238
| | | | llvm-svn: 211932
* [NVPTX] Add support for isspacep instructionJustin Holewinski2014-06-272-0/+40
| | | | llvm-svn: 211931
* [NVPTX] Add support for envreg readsJustin Holewinski2014-06-272-1/+45
| | | | llvm-svn: 211930
* [NVPTX] Add target options for PTX 3.2/4.0 and SM 5.0 (Maxwell)Justin Holewinski2014-06-272-7/+11
| | | | | | Default PTX version is set to PTX 3.2 llvm-svn: 211929
* [NVPTX] Update sub-target feature detectionJustin Holewinski2014-06-271-3/+5
| | | | llvm-svn: 211928
* [NVPTX] Directly control the Machine SSA passes that are invoked for NVPTX.Justin Holewinski2014-06-271-0/+41
| | | | | | | NVPTX is a bit special in the optimizations it requires, so this gives us better control over the backend optimization pipeline. llvm-svn: 211927
* [NVPTX] Emit .weak when linkage is not external, internal, or privateJustin Holewinski2014-06-271-0/+7
| | | | llvm-svn: 211926
* [NVPTX] Just use getTypeAllocSize() when computing return value size for ↵Justin Holewinski2014-06-271-17/+1
| | | | | | structures and vectors llvm-svn: 211925
* Move NVPTX subtarget dependent variables from the target machineEric Christopher2014-06-275-49/+70
| | | | | | to the subtarget. llvm-svn: 211860
* Use the target lowering we can get off of the DAG rather than offEric Christopher2014-06-271-1/+1
| | | | | | of the cached target machine. llvm-svn: 211858
* Move the constructor for NVPTXFrameLowering into the implementationEric Christopher2014-06-272-5/+6
| | | | | | file in preparation for the subtarget move. llvm-svn: 211847
* Remove unnecessary caching of the TargetMachine on NVPTXFrameLowering.Eric Christopher2014-06-273-14/+17
| | | | | | Adjust the constructor accordingly. llvm-svn: 211846
OpenPOWER on IntegriCloud