summaryrefslogtreecommitdiffstats
path: root/parallel-libs/streamexecutor/examples
Commit message (Collapse)AuthorAgeFilesLines
* [SE] Remove StreamExecutorJason Henline2016-10-253-240/+0
| | | | | | | | | | | | | | Summary: The project has been renamed to Acxxel, so this old directory needs to be deleted. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, parallel_libs-commits, modocache Differential Revision: https://reviews.llvm.org/D25964 llvm-svn: 285115
* [SE] Pack global dev handle addressesJason Henline2016-09-131-2/+2
| | | | | | | | | | | | | | | | | Summary: We were packing global device memory handles in `PackedKernelArgumentArray`, but as I was implementing the CUDA platform, I realized that CUDA wants the address of the handle, not the handle itself. So this patch switches to packing the address of the handle. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24528 llvm-svn: 281424
* [SE] Platforms return Device valuesJason Henline2016-09-132-14/+14
| | | | | | | | | | | | | | | Summary: Platforms were returning Device pointers, but a Device is now basically just a pointer to an underlying PlatformDevice, so we will now just pass it around as a value. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24537 llvm-svn: 281422
* [SE] Host platform implementationJason Henline2016-09-133-1/+97
| | | | | | | | | | | | | | | | | Summary: This implementation does not currently support multiple concurrent streams, and it won't allow kernels to be launched with grids larger than one block or blocks larger than one thread. These limitations could be removed in the future by launching new threads on the host, but that is not done in this implementation. Reviewers: jlebar Subscribers: beanz, mgorny, jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24473 llvm-svn: 281377
* [SE] RegisteredHostMemory for async device copiesJason Henline2016-09-121-3/+8
| | | | | | | | | | | | | | | | | | | Summary: Improve the error-prone interface that allows users to pass host pointers that haven't been registered to asynchronous copy methods. In CUDA, this is an extremely easy error to make, and instead of failing at runtime, it succeeds and gives the right answers by turning the async copy into a sync copy. So, you silently get a huge performance degradation if you misuse the old interface. This new interface should prevent that. Reviewers: jlebar Subscribers: jprice, beanz, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24353 llvm-svn: 281225
* [SE] Doc tweaksJason Henline2016-09-022-6/+6
| | | | | | | | | | | | | | | | | | | | | Summary: * Sections on main page. * Use std algorithm for equality check in example. * Add tree view on left side. * Add extra CSS sheet to restrict content width. * Add mild background color. * Restrict alphabetic indexes to 1 column. * Round corners of content boxes. * Rename example to CUDASaxpy.cpp. * Add CUDASaxpy.cpp to "Examples" section. Reviewers: jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24198 llvm-svn: 280511
* [SE] GlobalDeviceMemory owns its handleJason Henline2016-09-021-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Final step in getting GlobalDeviceMemory to own its handle. * Make GlobalDeviceMemory movable, but no longer copyable. * Make Device::freeDeviceMemory function private and make GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase can free its memory in its destructor. * Make GlobalDeviceMemory constructor private and make Device a friend so it can construct GlobalDeviceMemory. * Remove SharedDeviceMemoryBase class because it is never used. * Remove explicit memory freeing from example code. This change just consumes any errors generated during device memory freeing. The real error handling will be added in a future patch. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24195 llvm-svn: 280509
* [SE] Make Kernel movableJason Henline2016-09-021-3/+2
| | | | | | | | | | | | | | | Summary: Kernel is basically just a smart pointer to the underlying implementation, so making it movable prevents having to store a std::unique_ptr to it. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24150 llvm-svn: 280437
* [SE] Make Stream movableJason Henline2016-09-011-3/+3
| | | | | | | | | | | | | | Summary: The example code makes it clear that this is a much better design decision. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24142 llvm-svn: 280397
* [StreamExecutor] getOrDie and dieIfError utilsJason Henline2016-08-311-24/+3
| | | | | | | | | | Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24107 llvm-svn: 280312
* [StreamExecutor] Add Doxygen main pageJason Henline2016-08-312-0/+165
Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24066 llvm-svn: 280277
OpenPOWER on IntegriCloud