summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGenOpenCL
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Add clamp bit to dot builtinsKonstantin Zhuravlyov2018-08-013-21/+76
| | | | | | Differential Revision: https://reviews.llvm.org/D50011 llvm-svn: 338471
* [DebugInfo][OpenCL] Generate correct block literal debug info for OpenCLScott Linder2018-07-301-2/+10
| | | | | | | | | OpenCL block literal structs have different fields which are now correctly identified in the debug info. Differential Revision: https://reviews.llvm.org/D49930 llvm-svn: 338299
* CodeGen: specify alignment + inbounds for automatic variable initializationJF Bastien2018-07-131-5/+5
| | | | | | | | | | Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds. Subscribers: dexonsmith, cfe-commits Differential Revision: https://reviews.llvm.org/D49209 llvm-svn: 337041
* [AMDGPU] fixes for lds f32 builtinsDaniil Fukalov2018-05-211-6/+6
| | | | | | | | | | | | 1. added restrictions to memory scope, order and volatile parameters 2. added custom processing for these builtins - currently is not used code, needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm tree) 3. builtins renamed as requested Differential Revision: https://reviews.llvm.org/D43281 llvm-svn: 332848
* [OpenCL] make test independent of optimizerSanjay Patel2018-05-161-7/+8
| | | | | | | | | There shouldn't be any tests that run the entire optimizer here, but the last test in this file is definitely going to break with a change in LLVM IR canonicalization. Change that part to check the unoptimized IR because that's the real intent of this file. llvm-svn: 332473
* [OpenCL] Fix typos in emitted enqueue kernel function namesYaxun Liu2018-05-091-9/+9
| | | | | | | | | | Two typos: vaarg => vararg get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple Differential Revision: https://reviews.llvm.org/D46601 llvm-svn: 331895
* [OpenCL] Add constant address space to __func__ in AST.Anastasia Stulova2018-05-091-3/+9
| | | | | | | | | | | | Added string literal helper function to obtain the type attributed by a constant address space. Also fixed predefind __func__ expr to use the helper to constract the string literal correctly. Differential Revision: https://reviews.llvm.org/D46049 llvm-svn: 331877
* Add Microsoft Mangling for OpenCL Half TypeErich Keane2018-05-011-0/+1
| | | | | | | | | | | | Half-type mangling is accomplished following the method introduced by Erich Keane for mangling _Float16. Updated the half.cl LIT test to cover this particular case. Patch By: vbridgers Differential Revision: https://reviews.llvm.org/D46131 llvm-svn: 331263
* AMDGPU: Add Vega12 and Vega20Matt Arsenault2018-04-303-0/+73
| | | | | | | | Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331216
* [OpenCL] Add separate read_only and write_only pipe IR typesSven van Haastregt2018-04-273-38/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | SPIR-V encodes the read_only and write_only access qualifiers of pipes, so separate LLVM IR types are required to target SPIR-V. Other backends may also find this useful. These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which replace `opencl.pipe_t`. This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...) which took a read_only pipe with separate versions for read_only and write_only pipes, namely: * __get_pipe_num_packets_ro(...) * __get_pipe_num_packets_wo(...) * __get_pipe_max_packets_ro(...) * __get_pipe_max_packets_wo(...) These separate versions exist to avoid needing a bitcast to one of the two qualified pipe types. Patch by Stuart Brady. Differential Revision: https://reviews.llvm.org/D46015 llvm-svn: 331026
* Fix some tests that were failing on WindowsHans Wennborg2018-04-201-2/+2
| | | | llvm-svn: 330441
* [OpenCL] Add 'denorms-are-zero' function attributeAlexey Sotkin2018-04-201-13/+19
| | | | | | | | | | | | | | | | | | Summary: Generate attribute 'denorms-are-zero'='true' if '-cl-denorms-are-zero' compile option was specified and 'denorms-are-zero'='false' otherwise. Patch by krisb Reviewers: Anastasia, yaxunl Reviewed By: yaxunl Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D45808 llvm-svn: 330404
* Fix typos in clangAlexander Kornienko2018-04-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399
* AMDGPU: Update datalayout for stack alignmentMatt Arsenault2018-03-271-1/+1
| | | | llvm-svn: 328657
* [AMDGPU] Fix codegen for inline assemblyYaxun Liu2018-03-231-0/+8
| | | | | | | | Need to override convertConstraint to recognise amdgpu specific register names. Differential Revision: https://reviews.llvm.org/D44533 llvm-svn: 328359
* [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU (CLANG)Tony Tye2018-03-231-25/+25
| | | | | | | | Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue. Differential Revision: https://reviews.llvm.org/D44696 llvm-svn: 328350
* [AMDGPU] Remove use of OpenCL triple environment and replace with function ↵Tony Tye2018-03-231-26/+35
| | | | | | | | | | | attribute for AMDGPU (CLANG) - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use a function attribute to communicate to the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D43735 llvm-svn: 328347
* Recommit r326946 after reducing CallArgList memory footprintYaxun Liu2018-03-152-3/+37
| | | | llvm-svn: 327634
* Revert r326946. It caused stack overflows by significantly increasing the ↵Richard Smith2018-03-102-37/+3
| | | | | | size of a CallArgList. llvm-svn: 327195
* CodeGen: Fix address space of indirect function argumentYaxun Liu2018-03-072-3/+37
| | | | | | | | | | | | | | | | | | | | | The indirect function argument is in alloca address space in LLVM IR. However, during Clang codegen for C++, the address space of indirect function argument should match its address space in the source code, i.e., default addr space, even for indirect argument. This is because destructor of the indirect argument may be called in the caller function, and address of the indirect argument may be taken, in either case the indirect function argument is expected to be in default addr space, not the alloca address space. Therefore, the indirect function argument should be mapped to the temp var casted to default address space. The caller will cast it to alloca addr space when passing it to the callee. In the callee, the argument is also casted to the default address space and used. CallArg is refactored to facilitate this fix. Differential Revision: https://reviews.llvm.org/D34367 llvm-svn: 326946
* [OpenCL] Remove block invoke function from emitted block literal structYaxun Liu2018-03-073-91/+67
| | | | | | | | | | | | | | | | | | | | | OpenCL runtime tracks the invoke function emitted for any block expression. Due to restrictions on blocks in OpenCL (v2.0 s6.12.5), it is always possible to know the block invoke function when emitting call of block expression or __enqueue_kernel builtin functions. Since __enqueu_kernel already has an argument for the invoke function, it is redundant to have invoke function member in the llvm block literal structure. This patch removes invoke function from the llvm block literal structure. It also removes the bitcast of block invoke function to the generic block literal type which is useless for OpenCL. This will save some space for the kernel argument, and also eliminate some store instructions. Differential Revision: https://reviews.llvm.org/D43783 llvm-svn: 326937
* Bring r325915 back.Rafael Espindola2018-02-234-15/+15
| | | | | | | | | | | | | | | The tests that failed on a windows host have been fixed. Original message: Start setting dso_local for COFF. With this there are still some GVs where we don't set dso_local because setGVProperties is never called. I intend to fix that in followup commits. This is just the bare minimum to teach shouldAssumeDSOLocal what it should do for COFF. llvm-svn: 325940
* [OpenCL] Add '-cl-uniform-work-group-size' compile optionAlexey Sotkin2018-02-223-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option, which requires that the global work-size be a multiple of the work-group size specified to clEnqueueNDRangeKernel and allows optimizations that are made possible by this restriction. The patch introduces the support of this option. To keep information about whether an OpenCL kernel has uniform work group size or not, clang generates 'uniform-work-group-size' function attribute for every kernel: - "uniform-work-group-size"="true" for OpenCL 1.2 and lower, - "uniform-work-group-size"="true" for OpenCL 2.0 and higher if '-cl-uniform-work-group-size' option was specified, - "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no '-cl-uniform-work-group-size' options was specified. If the function is not an OpenCL kernel, 'uniform-work-group-size' attribute isn't generated. Patch by: krisb Reviewers: yaxunl, Anastasia, b-sumner Reviewed By: yaxunl, Anastasia Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits Differential Revision: https://reviews.llvm.org/D43570 llvm-svn: 325771
* Clean up AMDGCN testsYaxun Liu2018-02-156-83/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D43340 llvm-svn: 325279
* [OpenCL] Fix __enqueue_block for block with capturesYaxun Liu2018-02-152-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | The following test case causes issue with codegen of __enqueue_block void (^block)(void) = ^{ callee(id, out); }; enqueue_kernel(queue, 0, ndrange, block); Clang first does codegen for block expression in the first line and deletes its block info. Clang then tries to do codegen for the same block expression again for the second line, and fails because the block info is gone. The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to record llvm block invoke function and llvm block literal emitted for each AST block expression, and use the recorded information for generating the wrapper kernel. The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen of blocks. Another minor issue is that some clean up AST expression is generated for block with captures, which can be stripped by IgnoreImplicit. Differential Revision: https://reviews.llvm.org/D43240 llvm-svn: 325264
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-1311-55/+65
| | | | | | Differential Revision: https://reviews.llvm.org/D43171 llvm-svn: 325031
* AMDGPU: Update for datalayout changeMatt Arsenault2018-02-091-1/+1
| | | | llvm-svn: 324748
* Fix crash on array initializer with non-0 alloca addrspaceMatt Arsenault2018-02-082-3/+37
| | | | llvm-svn: 324641
* Recommit rL323890: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functionsDaniil Fukalov2018-02-041-0/+17
| | | | | | Fixed asserts in tests. llvm-svn: 324201
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-0211-266/+280
| | | | | | | | This requires corresponding llvm change. Differential Revision: https://reviews.llvm.org/D40956 llvm-svn: 324102
* Revert "[AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions"Daniil Fukalov2018-01-311-20/+0
| | | | | | | | This reverts https://reviews.llvm.org/rL323890 This reverts commit 251524ebd8c346a936f0e74b09d609d49fbaae4a. llvm-svn: 323896
* [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functionsDaniil Fukalov2018-01-311-0/+20
| | | | | | | | Reviewed by arsenm Differential Revision: https://reviews.llvm.org/D42578 llvm-svn: 323890
* Change memcpy/memove/memset to have dest and source alignment attributes ↵Daniel Neilson2018-01-194-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | (Step 1). Summary: Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset intrinsics. This change updates the Clang tests for this change. The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change removes the alignment argument in favour of placing the alignment attribute on the source and destination pointers of the memory intrinsic call. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) At this time the source and destination alignments must be the same (Step 1). Step 2 of the change, to be landed shortly, will relax that contraint and allow the source and destination to have different alignments. llvm-svn: 322964
* CodeGen: Fix invalid bitcasts for memcpyYaxun Liu2017-12-071-14/+26
| | | | | | | | | | | | CreateCoercedLoad/CreateCoercedStore assumes pointer argument of memcpy is in addr space 0, which is not correct and causes invalid bitcasts for triple amdgcn---amdgiz. It is fixed by using alloca addr space instead. Differential Revision: https://reviews.llvm.org/D40806 llvm-svn: 320000
* [OpenCL] Fix code generation of function-scope constant samplers.Alexey Bader2017-11-151-0/+18
| | | | | | | | | | | | | | | | | Summary: Constant samplers are handled as static variables and clang's code generation library, which leads to llvm::unreachable. We bypass emitting sampler variable as static since it's translated to a function call later. Reviewers: yaxunl, Anastasia Reviewed By: yaxunl, Anastasia Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D34342 llvm-svn: 318290
* OpenCL: Assume inline asm is convergentMatt Arsenault2017-11-131-0/+7
| | | | | | Already done for CUDA. llvm-svn: 318098
* CodeGen: Fix missing debug loc due to allocaYaxun Liu2017-10-241-0/+18
| | | | | | | | | | | | Builder save/restores insertion pointer when emitting addr space cast for alloca, but does not save/restore debug loc, which causes verifier failure for certain call instructions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39069 llvm-svn: 316484
* CodeGen: Fix invalid bitcast in partial initialization of automatic arrary ↵Yaxun Liu2017-10-231-0/+8
| | | | | | | | variable Differential Revision: https://reviews.llvm.org/D39184 llvm-svn: 316353
* [AMDGPU] Fix bug in enqueued block codegen due to an extra lineYaxun Liu2017-10-191-0/+9
| | | | llvm-svn: 316165
* CodeGen: Fix invalid bitcasts for atomic builtinsYaxun Liu2017-10-171-39/+39
| | | | | | | | | | | | | Currently clang assumes the temporary variables emitted during codegen of atomic builtins have address space 0, which is not true for target triple amdgcn---amdgiz and causes invalid bitcasts. This patch fixes that. Differential Revision: https://reviews.llvm.org/D38966 llvm-svn: 316000
* [OpenCL] Emit enqueued block as kernelYaxun Liu2017-10-143-26/+200
| | | | | | | | | | | | | | | | | In OpenCL the kernel function and non-kernel function has different calling conventions. For certain targets they have different argument ABIs. Also kernels have special function attributes and metadata for runtime to launch them. The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such, the block invoke function should be emitted as kernel with proper calling convention and argument ABI. This patch emits enqueued block as kernel. If a block is both called directly and passed to enqueue_kernel, separate functions will be generated. Differential Revision: https://reviews.llvm.org/D38134 llvm-svn: 315804
* Fix regression of test/CodeGenOpenCL/address-spaces.cl on ppcYaxun Liu2017-10-131-1/+2
| | | | llvm-svn: 315678
* [OpenCL] Add LangAS::opencl_private to represent private address space in ASTYaxun Liu2017-10-132-5/+84
| | | | | | | | | | | | | | | | | | | | | | | | Currently Clang uses default address space (0) to represent private address space for OpenCL in AST. There are two issues with this: Multiple address spaces including private address space cannot be diagnosed. There is no mangling for default address space. For example, if private int* is emitted as i32 addrspace(5)* in IR. It is supposed to be mangled as PUAS5i but it is mangled as Pi instead. This patch attempts to represent OpenCL private address space explicitly in AST. It adds a new enum LangAS::opencl_private and adds it to the variable types which are implicitly private: automatic variables without address space qualifier function parameter pointee type without address space qualifier (OpenCL 1.2 and below) Differential Revision: https://reviews.llvm.org/D35082 llvm-svn: 315668
* AMDGPU: Add read_exec_lo/hi builtinsMatt Arsenault2017-10-091-0/+14
| | | | llvm-svn: 315238
* AMDGPU: Fix missing declaration for __builtin_amdgcn_dispatch_ptrMatt Arsenault2017-10-091-0/+7
| | | | llvm-svn: 315219
* OpenCL: Assume functions are convergentMatt Arsenault2017-10-062-53/+69
| | | | | | | | | This was done for CUDA functions in r261779, and for the same reason this also needs to be done for OpenCL. An arbitrary function could have a barrier() call in it, which in turn requires the calling function to be convergent. llvm-svn: 315094
* [OpenCL] Clean up and add missing fields for block structYaxun Liu2017-10-042-32/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently block is translated to a structure equivalent to struct Block { void *isa; int flags; int reserved; void *invoke; void *descriptor; }; Except invoke, which is the pointer to the block invoke function, all other fields are useless for OpenCL, which clutter the IR and also waste memory since the block struct is passed to the block invoke function as argument. On the other hand, the size and alignment of the block struct is not stored in the struct, which causes difficulty to implement __enqueue_kernel as library function, since the library function needs to know the size and alignment of the argument which needs to be passed to the kernel. This patch removes the useless fields from the block struct and adds size and align fields. The equivalent block struct will become struct Block { int size; int align; generic void *invoke; /* custom fields */ }; It also changes the pointer to the invoke function to be a generic pointer since the address space of a function may not be private on certain targets. Differential Revision: https://reviews.llvm.org/D37822 llvm-svn: 314932
* [OpenCL] Fixed CL version in failing test.Anastasia Stulova2017-09-271-1/+1
| | | | llvm-svn: 314317
* [OpenCL] Handle address space conversion while setting type alignment.Anastasia Stulova2017-09-271-2/+15
| | | | | | | | | Added missing addrspacecast case in alignment computation logic of pointer type emission in IR generation. Differential Revision: https://reviews.llvm.org/D37804 llvm-svn: 314304
* Add more tests for OpenCL atomic builtin functionsYaxun Liu2017-09-132-1/+45
| | | | | | | | Add tests for different address spaces and insert some blank lines to make them more readable. Differential Revision: https://reviews.llvm.org/D37742 llvm-svn: 313172
OpenPOWER on IntegriCloud