summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGenOpenCL
Commit message (Collapse)AuthorAgeFilesLines
...
* Recommit r326946 after reducing CallArgList memory footprintYaxun Liu2018-03-152-3/+37
| | | | llvm-svn: 327634
* Revert r326946. It caused stack overflows by significantly increasing the ↵Richard Smith2018-03-102-37/+3
| | | | | | size of a CallArgList. llvm-svn: 327195
* CodeGen: Fix address space of indirect function argumentYaxun Liu2018-03-072-3/+37
| | | | | | | | | | | | | | | | | | | | | The indirect function argument is in alloca address space in LLVM IR. However, during Clang codegen for C++, the address space of indirect function argument should match its address space in the source code, i.e., default addr space, even for indirect argument. This is because destructor of the indirect argument may be called in the caller function, and address of the indirect argument may be taken, in either case the indirect function argument is expected to be in default addr space, not the alloca address space. Therefore, the indirect function argument should be mapped to the temp var casted to default address space. The caller will cast it to alloca addr space when passing it to the callee. In the callee, the argument is also casted to the default address space and used. CallArg is refactored to facilitate this fix. Differential Revision: https://reviews.llvm.org/D34367 llvm-svn: 326946
* [OpenCL] Remove block invoke function from emitted block literal structYaxun Liu2018-03-073-91/+67
| | | | | | | | | | | | | | | | | | | | | OpenCL runtime tracks the invoke function emitted for any block expression. Due to restrictions on blocks in OpenCL (v2.0 s6.12.5), it is always possible to know the block invoke function when emitting call of block expression or __enqueue_kernel builtin functions. Since __enqueu_kernel already has an argument for the invoke function, it is redundant to have invoke function member in the llvm block literal structure. This patch removes invoke function from the llvm block literal structure. It also removes the bitcast of block invoke function to the generic block literal type which is useless for OpenCL. This will save some space for the kernel argument, and also eliminate some store instructions. Differential Revision: https://reviews.llvm.org/D43783 llvm-svn: 326937
* Bring r325915 back.Rafael Espindola2018-02-234-15/+15
| | | | | | | | | | | | | | | The tests that failed on a windows host have been fixed. Original message: Start setting dso_local for COFF. With this there are still some GVs where we don't set dso_local because setGVProperties is never called. I intend to fix that in followup commits. This is just the bare minimum to teach shouldAssumeDSOLocal what it should do for COFF. llvm-svn: 325940
* [OpenCL] Add '-cl-uniform-work-group-size' compile optionAlexey Sotkin2018-02-223-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option, which requires that the global work-size be a multiple of the work-group size specified to clEnqueueNDRangeKernel and allows optimizations that are made possible by this restriction. The patch introduces the support of this option. To keep information about whether an OpenCL kernel has uniform work group size or not, clang generates 'uniform-work-group-size' function attribute for every kernel: - "uniform-work-group-size"="true" for OpenCL 1.2 and lower, - "uniform-work-group-size"="true" for OpenCL 2.0 and higher if '-cl-uniform-work-group-size' option was specified, - "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no '-cl-uniform-work-group-size' options was specified. If the function is not an OpenCL kernel, 'uniform-work-group-size' attribute isn't generated. Patch by: krisb Reviewers: yaxunl, Anastasia, b-sumner Reviewed By: yaxunl, Anastasia Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits Differential Revision: https://reviews.llvm.org/D43570 llvm-svn: 325771
* Clean up AMDGCN testsYaxun Liu2018-02-156-83/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D43340 llvm-svn: 325279
* [OpenCL] Fix __enqueue_block for block with capturesYaxun Liu2018-02-152-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | The following test case causes issue with codegen of __enqueue_block void (^block)(void) = ^{ callee(id, out); }; enqueue_kernel(queue, 0, ndrange, block); Clang first does codegen for block expression in the first line and deletes its block info. Clang then tries to do codegen for the same block expression again for the second line, and fails because the block info is gone. The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to record llvm block invoke function and llvm block literal emitted for each AST block expression, and use the recorded information for generating the wrapper kernel. The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen of blocks. Another minor issue is that some clean up AST expression is generated for block with captures, which can be stripped by IgnoreImplicit. Differential Revision: https://reviews.llvm.org/D43240 llvm-svn: 325264
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-1311-55/+65
| | | | | | Differential Revision: https://reviews.llvm.org/D43171 llvm-svn: 325031
* AMDGPU: Update for datalayout changeMatt Arsenault2018-02-091-1/+1
| | | | llvm-svn: 324748
* Fix crash on array initializer with non-0 alloca addrspaceMatt Arsenault2018-02-082-3/+37
| | | | llvm-svn: 324641
* Recommit rL323890: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functionsDaniil Fukalov2018-02-041-0/+17
| | | | | | Fixed asserts in tests. llvm-svn: 324201
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-0211-266/+280
| | | | | | | | This requires corresponding llvm change. Differential Revision: https://reviews.llvm.org/D40956 llvm-svn: 324102
* Revert "[AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions"Daniil Fukalov2018-01-311-20/+0
| | | | | | | | This reverts https://reviews.llvm.org/rL323890 This reverts commit 251524ebd8c346a936f0e74b09d609d49fbaae4a. llvm-svn: 323896
* [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functionsDaniil Fukalov2018-01-311-0/+20
| | | | | | | | Reviewed by arsenm Differential Revision: https://reviews.llvm.org/D42578 llvm-svn: 323890
* Change memcpy/memove/memset to have dest and source alignment attributes ↵Daniel Neilson2018-01-194-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | (Step 1). Summary: Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset intrinsics. This change updates the Clang tests for this change. The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change removes the alignment argument in favour of placing the alignment attribute on the source and destination pointers of the memory intrinsic call. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) At this time the source and destination alignments must be the same (Step 1). Step 2 of the change, to be landed shortly, will relax that contraint and allow the source and destination to have different alignments. llvm-svn: 322964
* CodeGen: Fix invalid bitcasts for memcpyYaxun Liu2017-12-071-14/+26
| | | | | | | | | | | | CreateCoercedLoad/CreateCoercedStore assumes pointer argument of memcpy is in addr space 0, which is not correct and causes invalid bitcasts for triple amdgcn---amdgiz. It is fixed by using alloca addr space instead. Differential Revision: https://reviews.llvm.org/D40806 llvm-svn: 320000
* [OpenCL] Fix code generation of function-scope constant samplers.Alexey Bader2017-11-151-0/+18
| | | | | | | | | | | | | | | | | Summary: Constant samplers are handled as static variables and clang's code generation library, which leads to llvm::unreachable. We bypass emitting sampler variable as static since it's translated to a function call later. Reviewers: yaxunl, Anastasia Reviewed By: yaxunl, Anastasia Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D34342 llvm-svn: 318290
* OpenCL: Assume inline asm is convergentMatt Arsenault2017-11-131-0/+7
| | | | | | Already done for CUDA. llvm-svn: 318098
* CodeGen: Fix missing debug loc due to allocaYaxun Liu2017-10-241-0/+18
| | | | | | | | | | | | Builder save/restores insertion pointer when emitting addr space cast for alloca, but does not save/restore debug loc, which causes verifier failure for certain call instructions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39069 llvm-svn: 316484
* CodeGen: Fix invalid bitcast in partial initialization of automatic arrary ↵Yaxun Liu2017-10-231-0/+8
| | | | | | | | variable Differential Revision: https://reviews.llvm.org/D39184 llvm-svn: 316353
* [AMDGPU] Fix bug in enqueued block codegen due to an extra lineYaxun Liu2017-10-191-0/+9
| | | | llvm-svn: 316165
* CodeGen: Fix invalid bitcasts for atomic builtinsYaxun Liu2017-10-171-39/+39
| | | | | | | | | | | | | Currently clang assumes the temporary variables emitted during codegen of atomic builtins have address space 0, which is not true for target triple amdgcn---amdgiz and causes invalid bitcasts. This patch fixes that. Differential Revision: https://reviews.llvm.org/D38966 llvm-svn: 316000
* [OpenCL] Emit enqueued block as kernelYaxun Liu2017-10-143-26/+200
| | | | | | | | | | | | | | | | | In OpenCL the kernel function and non-kernel function has different calling conventions. For certain targets they have different argument ABIs. Also kernels have special function attributes and metadata for runtime to launch them. The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such, the block invoke function should be emitted as kernel with proper calling convention and argument ABI. This patch emits enqueued block as kernel. If a block is both called directly and passed to enqueue_kernel, separate functions will be generated. Differential Revision: https://reviews.llvm.org/D38134 llvm-svn: 315804
* Fix regression of test/CodeGenOpenCL/address-spaces.cl on ppcYaxun Liu2017-10-131-1/+2
| | | | llvm-svn: 315678
* [OpenCL] Add LangAS::opencl_private to represent private address space in ASTYaxun Liu2017-10-132-5/+84
| | | | | | | | | | | | | | | | | | | | | | | | Currently Clang uses default address space (0) to represent private address space for OpenCL in AST. There are two issues with this: Multiple address spaces including private address space cannot be diagnosed. There is no mangling for default address space. For example, if private int* is emitted as i32 addrspace(5)* in IR. It is supposed to be mangled as PUAS5i but it is mangled as Pi instead. This patch attempts to represent OpenCL private address space explicitly in AST. It adds a new enum LangAS::opencl_private and adds it to the variable types which are implicitly private: automatic variables without address space qualifier function parameter pointee type without address space qualifier (OpenCL 1.2 and below) Differential Revision: https://reviews.llvm.org/D35082 llvm-svn: 315668
* AMDGPU: Add read_exec_lo/hi builtinsMatt Arsenault2017-10-091-0/+14
| | | | llvm-svn: 315238
* AMDGPU: Fix missing declaration for __builtin_amdgcn_dispatch_ptrMatt Arsenault2017-10-091-0/+7
| | | | llvm-svn: 315219
* OpenCL: Assume functions are convergentMatt Arsenault2017-10-062-53/+69
| | | | | | | | | This was done for CUDA functions in r261779, and for the same reason this also needs to be done for OpenCL. An arbitrary function could have a barrier() call in it, which in turn requires the calling function to be convergent. llvm-svn: 315094
* [OpenCL] Clean up and add missing fields for block structYaxun Liu2017-10-042-32/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently block is translated to a structure equivalent to struct Block { void *isa; int flags; int reserved; void *invoke; void *descriptor; }; Except invoke, which is the pointer to the block invoke function, all other fields are useless for OpenCL, which clutter the IR and also waste memory since the block struct is passed to the block invoke function as argument. On the other hand, the size and alignment of the block struct is not stored in the struct, which causes difficulty to implement __enqueue_kernel as library function, since the library function needs to know the size and alignment of the argument which needs to be passed to the kernel. This patch removes the useless fields from the block struct and adds size and align fields. The equivalent block struct will become struct Block { int size; int align; generic void *invoke; /* custom fields */ }; It also changes the pointer to the invoke function to be a generic pointer since the address space of a function may not be private on certain targets. Differential Revision: https://reviews.llvm.org/D37822 llvm-svn: 314932
* [OpenCL] Fixed CL version in failing test.Anastasia Stulova2017-09-271-1/+1
| | | | llvm-svn: 314317
* [OpenCL] Handle address space conversion while setting type alignment.Anastasia Stulova2017-09-271-2/+15
| | | | | | | | | Added missing addrspacecast case in alignment computation logic of pointer type emission in IR generation. Differential Revision: https://reviews.llvm.org/D37804 llvm-svn: 314304
* Add more tests for OpenCL atomic builtin functionsYaxun Liu2017-09-132-1/+45
| | | | | | | | Add tests for different address spaces and insert some blank lines to make them more readable. Differential Revision: https://reviews.llvm.org/D37742 llvm-svn: 313172
* [AMDGPU] Change addr space of clk_event_t, queue_t and reserve_id_t to globalYaxun Liu2017-09-131-8/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D37703 llvm-svn: 313171
* [OpenCL] Add half load and store builtinsJan Vesely2017-09-071-0/+39
| | | | | | | | This enables load/stores of half type, without half being a legal type. Differential Revision: https://reviews.llvm.org/D37231 llvm-svn: 312742
* [OpenCL] Do not use vararg in emitted functions for enqueue_kernelYaxun Liu2017-09-031-18/+72
| | | | | | | | | | Not all targets support vararg (e.g. amdgpu). Instead of using vararg in the emitted functions for enqueue_kernel, this patch creates a temporary array of size_t, stores the size arguments in the temporary array and passes it to the emitted functions for enqueue_kernel. Differential Revision: https://reviews.llvm.org/D36678 llvm-svn: 312441
* Adapt testcases to LLVM change r312144 in DIGlobalVariableExpressionAdrian Prantl2017-08-301-20/+20
| | | | llvm-svn: 312148
* Parse and print DIExpressions inline to ease IR and MIR testingReid Kleckner2017-08-231-21/+18
| | | | | | | | | | | | | | | | | | | Summary: Most DIExpressions are empty or very simple. When they are complex, they tend to be unique, so checking them inline is reasonable. This also avoids the need for CodeGen passes to append to the llvm.dbg.mir named md node. See also PR22780, for making DIExpression not be an MDNode. Reviewers: aprantl, dexonsmith, dblaikie Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D37075 llvm-svn: 311594
* Attempt to fix failure in CodeGenOpenCL/atomic-ops.cl againYaxun Liu2017-08-151-20/+20
| | | | llvm-svn: 310937
* Attempt to fix failure in CodeGenOpenCL/atomic-ops.clYaxun Liu2017-08-151-5/+5
| | | | llvm-svn: 310932
* Remove -finclude-default-header in OpenCL atomic testsYaxun Liu2017-08-152-5/+43
| | | | | | Differential Revision: https://reviews.llvm.org/D36676 llvm-svn: 310927
* [OpenCL] Support variable memory scope in atomic builtinsYaxun Liu2017-08-152-1/+79
| | | | | | Differential Revision: https://reviews.llvm.org/D36580 llvm-svn: 310924
* [OpenCL] Allow targets to select address space per typeSven van Haastregt2017-08-151-2/+12
| | | | | | | | | | | | | Generalize getOpenCLImageAddrSpace into getOpenCLTypeAddrSpace, such that targets can select the address space per type. No functional changes intended. Initial patch by Simon Perretta. Differential Revision: https://reviews.llvm.org/D33989 llvm-svn: 310911
* AMDGPU: Use direct struct returns and argumentsMatt Arsenault2017-08-093-29/+553
| | | | | | | | | | | | | | | | This is an improvement over always using byval for structs. This will use registers until ~16 are used, and then switch back to byval. This needs more work, since I'm not sure it ever really makes sense to use byval. If the register limit is exceeded, the arguments still end up passed on the stack, but with a different ABI. It also may make sense to base this on number of registers used for non-struct arguments, rather than just arguments that appear first in the argument list. llvm-svn: 310527
* Add OpenCL 2.0 atomic builtin functions as Clang builtinYaxun Liu2017-08-042-0/+213
| | | | | | | | | | | | | | | | | | | | | OpenCL 2.0 atomic builtin functions have a scope argument which is ideally represented as synchronization scope argument in LLVM atomic instructions. Clang supports translating Clang atomic builtin functions to LLVM atomic instructions. However it currently does not support synchronization scope of LLVM atomic instructions. Without this, users have to use LLVM assembly code to implement OpenCL atomic builtin functions. This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin functions, which supports generating LLVM atomic instructions with synchronization scope operand. Currently only constant memory scope argument is supported. Support of non-constant memory scope argument will be added later. Differential Revision: https://reviews.llvm.org/D28691 llvm-svn: 310082
* [OpenCL] Add missing subgroup builtinsJoey Gouly2017-08-011-0/+5
| | | | | | | This adds get_kernel_max_sub_group_size_for_ndrange and get_kernel_sub_group_count_for_ndrange. llvm-svn: 309678
* [OpenCL] Enable subgroup extension in testsJoey Gouly2017-07-311-1/+1
| | | | | | | This fixes the test, so that it can be run on different hosts that may have different OpenCL extensions enabled. llvm-svn: 309571
* [OpenCL] Add extension Sema check for subgroup builtinsJoey Gouly2017-07-312-0/+4
| | | | | | Check the subgroup extension is enabled, before doing other Sema checks. llvm-svn: 309567
* [OpenCL] Fix access qualifiers metadata for kernel arguments with typedefAlexey Sotkin2017-07-261-0/+22
| | | | | | | | Subscribers: cfe-commits, yaxunl, Anastasia Differential Revision: https://reviews.llvm.org/D35420 llvm-svn: 309155
* [OpenCL] Added extended tests on metadata generation for half data type and ↵Egor Churaev2017-07-181-9/+28
| | | | | | | | | | | | | | arrays. Reviewers: Anastasia Reviewed By: Anastasia Subscribers: bader, cfe-commits, yaxunl Differential Revision: https://reviews.llvm.org/D35000 llvm-svn: 308266
OpenPOWER on IntegriCloud