summaryrefslogtreecommitdiffstats
path: root/parallel-libs
Commit message (Collapse)AuthorAgeFilesLines
* [Axccel] Remove -Wno-missing-braces in buildJason Henline2016-12-193-6/+6
| | | | | | | | | | | | | | | | Summary: I originally added the -Wno-missing-braces flag because I thought it was erroneously flagging std::array initializations. Now I realize the extra braces really are desired for these initializations, so I'm turning the warning flag back on. Reviewers: jlebar Subscribers: mgorny, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D27941 llvm-svn: 290137
* [Acxxel] Remove setActiveDeviceForThreadJason Henline2016-10-287-249/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: After experimenting with CUDA, I realized that we really only need to set the active context right before creating an object such as a stream or a device memory allocation. When we go on to use these objects later, it is fine if the context that created them is no longer active, operations with those objects will succeed anyway. Since it turns out that we don't have to check the active context for every operation, it makes sense to hide this active context from users (by removing the "ActiveDeviceForThread" setter and getter) and to change the Acxxel API to explicitly pass in the device ID to create objects. This change improves the Acxxel API and greatly simplifies the CUDA and OpenCL implementations because they no longer require thread_local data. Reviewers: jlebar, jprice Subscribers: mgorny, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D26050 llvm-svn: 285372
* [SE] Remove StreamExecutorJason Henline2016-10-2551-7668/+0
| | | | | | | | | | | | | | Summary: The project has been renamed to Acxxel, so this old directory needs to be deleted. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, parallel_libs-commits, modocache Differential Revision: https://reviews.llvm.org/D25964 llvm-svn: 285115
* Initial check-in of Acxxel (StreamExecutor renamed)Jason Henline2016-10-2521-0/+6679
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Acxxel is basically a simplified redesign of StreamExecutor. Here are the major points where Acxxel differs from the current StreamExecutor design: * Acxxel doesn't support the kernel and kernel loader types designed for emission by the compiler to support type-safe kernel launches. For CUDA, kernels in Acxxel can be seamlessly launched using the standard CUDA triple-chevron kernel launch syntax that is available with clang and nvcc. For CUDA and OpenCL, kernel arguments can be passed in the old-fashioned way, as one array of pointers to arguments and another array of argument sizes. Although OpenCL doesn't get a type-safe kernel launch method, it does still get the benefit of all the memory management wrappers. In the future, clang may add support for triple-chevron OpenCL kernel launchs, or some other type-safe OpenCL kernel launch method. * Acxxel does not depend on any other code in LLVM, so it builds completely independently from LLVM. The goal will be to check in Acxxel and remove StreamExecutor, or perhaps to remove the old StreamExecutor and rename Acxxel to StreamExecutor, so I think Acxxel should be thought of as a new version of StreamExecutor, not as a separate project. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, modocache, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D25701 llvm-svn: 285111
* [SE] Change CoreTests target nameJason Henline2016-09-271-1/+1
| | | | | | | | | | | | | | Summary: Call it StreamExecutorCoreTests in order to prevent collision with targets from other modules. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24949 llvm-svn: 282491
* [SE] Fix config bug with CUDA testsJason Henline2016-09-152-1/+1
| | | | | | | | | | | | | | | | | Summary: It turns out CMake errors out if a processed directory contains source files that are not used. This was causing an error with the CUDATest.cpp file when configuring StreamExecutor with the CUDA platform disabled. Moving CUDATest.cpp to its own directory fixes this problem. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24618 llvm-svn: 281654
* [SE] Support CUDA dynamic shared memoryJason Henline2016-09-153-7/+254
| | | | | | | | | | | | | | Summary: Add proper handling for shared memory arguments in the CUDA platform. Also add in unit tests for CUDA. Reviewers: jlebar Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24596 llvm-svn: 281635
* [SE] Let users specify CUDA pathJason Henline2016-09-154-51/+63
| | | | | | | | | | | | Summary: Add logic to allow users to specify the CUDA path at configuration time. Reviewers: jlebar Subscribers: beanz, mgorny, jlebar, jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24580 llvm-svn: 281626
* [SE] Add CUDA platformJason Henline2016-09-1413-17/+596
| | | | | | | | | | | | | | | | | | | | Summary: Basic CUDA platform implementation and cmake infrastructure to control whether it's used. A few important TODOs will be handled in later patches: * Log some error messages that can't easily be returned as Errors. * Cache modules and kernels to prevent reloading them if someone tries to reload a kernel that's already loaded. * Tolerate shared memory arguments for kernel launches. Reviewers: jlebar Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24538 llvm-svn: 281524
* [SE] Pack global dev handle addressesJason Henline2016-09-134-32/+14
| | | | | | | | | | | | | | | | | Summary: We were packing global device memory handles in `PackedKernelArgumentArray`, but as I was implementing the CUDA platform, I realized that CUDA wants the address of the handle, not the handle itself. So this patch switches to packing the address of the handle. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24528 llvm-svn: 281424
* Device doc says device is smallJason Henline2016-09-131-0/+4
| | | | llvm-svn: 281423
* [SE] Platforms return Device valuesJason Henline2016-09-134-24/+19
| | | | | | | | | | | | | | | Summary: Platforms were returning Device pointers, but a Device is now basically just a pointer to an underlying PlatformDevice, so we will now just pass it around as a value. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24537 llvm-svn: 281422
* [SE] KernelSpec return best PTXJason Henline2016-09-133-12/+15
| | | | | | | | | | | | | | | | Summary: Before, the kernel spec would only return PTX for exactly the requested compute capability. With this patch it will now return the PTX with the largest compute capability that does not exceed that requested compute capability. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24531 llvm-svn: 281417
* [SE] Use real HostPlatformDevice for testingJason Henline2016-09-135-151/+17
| | | | | | | | | | | | | | Summary: Replace uses of SimpleHostPlatformDevice in tests with HostPlatformDevice. Reviewers: jlebar Subscribers: jlebar, jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24519 llvm-svn: 281384
* [SE] Host platform implementationJason Henline2016-09-138-5/+329
| | | | | | | | | | | | | | | | | Summary: This implementation does not currently support multiple concurrent streams, and it won't allow kernels to be launched with grids larger than one block or blocks larger than one thread. These limitations could be removed in the future by launching new threads on the host, but that is not done in this implementation. Reviewers: jlebar Subscribers: beanz, mgorny, jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24473 llvm-svn: 281377
* [SE] Add .clang-formatJason Henline2016-09-1319-153/+120
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The .clang-tidy file is copied from the top-level LLVM source directory. Also fix warnings generated by clang-format: * Moved SimpleHostPlatformDevice.h so its header include guard could have the right format. * Changed signatures of methods taking llvm::Twine by value to take it by const ref instead. * Add "noexcept" to some move constructors and assignment operators. * Removed a bunch of places where single-statement loops and conditionals were surrounded with braces. (This was not found by the current clang-tidy, but with a local patch that I hope to upstream soon.) Reviewers: jlebar, jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24468 llvm-svn: 281374
* [SE] Stop using llvm-config --cxxflagsJason Henline2016-09-131-11/+4
| | | | | | | | | | | | | | | | | | | | | | | Summary: Build configuration was adding $(llvm-config --cxxflags) to the StreamExecutor CXXFLAGS, but this was causing "-O3" to be passed even for debug builds, and was making debugging difficult. The llvm-config call was originally introduced to handle the -fno-rtti flag because an RTTI StreamExecutor could not link with a no-RTTI LLVM. This patch converts to using LLVM_ENABLE_RTTI and only adding the `-fno-rtti` flag if needed, not all the rest of the LLVM CXXFLAGS. I have tested this with clang-4.0 and gcc-4.8 on Ubuntu. Some work will probably have to be done to support MSVC. Reviewers: jlebar Subscribers: beanz, jprice, parallel_libs-commits, mgorny Differential Revision: https://reviews.llvm.org/D24474 llvm-svn: 281347
* [SE] Clean up device and host memory slicesJason Henline2016-09-124-22/+35
| | | | | | | | | | | | | | | | | | | Summary: * Add LLVM_ATTRIBUTE_UNUSED_RESULT used to slicing methods in order to emphasize that the slicing is not done in place. * Change device memory slice function name from `drop_front` to `slice` in order to match the naming convention of `llvm::ArrayRef` and host memory slice. * Change the parameter names of host memory slice functions to `DropCount` and `TakeCount` to match device memory slice declarations. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24464 llvm-svn: 281239
* [SE] RegisteredHostMemory for async device copiesJason Henline2016-09-1211-302/+393
| | | | | | | | | | | | | | | | | | | Summary: Improve the error-prone interface that allows users to pass host pointers that haven't been registered to asynchronous copy methods. In CUDA, this is an extremely easy error to make, and instead of failing at runtime, it succeeds and gives the right answers by turning the async copy into a sync copy. So, you silently get a huge performance degradation if you misuse the old interface. This new interface should prevent that. Reviewers: jlebar Subscribers: jprice, beanz, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24353 llvm-svn: 281225
* [SE] Remove Utils directoryJason Henline2016-09-0911-19/+12
| | | | | | | | | | | | | | | | | | Summary: There is no purpose in splitting out the Error class from the rest of the StreamExecutor code. This organization was just a vestige of an old failed design. Plus, this change fixes a bug in the build where the utilites library was not being statically linked in with libstreamexecutor. Reviewers: jlebar, jprice Subscribers: beanz, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24434 llvm-svn: 281118
* [StreamExecutor] Make SE work with an in-tree LLVM build.Justin Lebar2016-09-0912-56/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: With these changes, we can put parallel-libs within llvm/projects and build as normal. This is kind of the minimal change I could figure out how to make while still making us compatible with llvm's build system. Some things I'm not thrilled about include: * The creation of a CoreTests directory (the macros really seemed to want this) * Pulling SimpleHostPlatformDevice.h into CoreTests. It seems to me this should live inside unittests/include, or maybe tests/include, but I didn't want to make that change in this patch. One important piece of work that remains to be done is to make $ ninja check-streamexecutor run all the tests. Right now the only way I've figured out to run the tests is $ ninja projects/parallel-libs/streamexecutor/unittests/StreamExecutorUnitTests $ projects/parallel-libs/streamexecutor/unittests/CoreTests/CoreTests Reviewers: jhen Subscribers: beanz, parallel_libs-commits, jprice Differential Revision: https://reviews.llvm.org/D24368 llvm-svn: 281091
* Add streamexecutor-configJason Henline2016-09-084-1/+215
| | | | | | | | | | | | | | Summary: Similar to llvm-config, gets command-line flags that are needed to build applications linking against StreamExecutor. Reviewers: jprice, jlebar Subscribers: beanz, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24302 llvm-svn: 280955
* [SE] Add getName method to Device classJason Henline2016-09-072-0/+7
| | | | | | | | | | Reviewers: jhen Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24240 llvm-svn: 280872
* [SE] Rename PlatformInterfaces to PlatformDeviceJason Henline2016-09-0611-27/+20
| | | | | | | | | | | | | | Summary: The only interface that we ever plan to have in this file is PlatformDevice, so it makes sense to rename the file to reflect that. Reviewers: jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24269 llvm-svn: 280737
* [SE] Remove Platform*Handle classesJason Henline2016-09-0610-95/+142
| | | | | | | | | | | | | | | | Summary: As pointed out by jprice, these classes don't serve a purpose. Instead, we stay consistent with the way memory is managed and let the Stream and Kernel classes directly hold opaque handles to device Stream and Kernel instances, respectively. Reviewers: jprice, jlebar Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24213 llvm-svn: 280719
* [SE] Add getByteCount methods for device memoryJason Henline2016-09-032-13/+22
| | | | | | | | | | | | | | Summary: Simple utility methods will prevent users from making mistakes when converting element counts to byte counts. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24197 llvm-svn: 280563
* [SE] Remove broken doc refJason Henline2016-09-021-3/+0
| | | | llvm-svn: 280512
* [SE] Doc tweaksJason Henline2016-09-025-14/+41
| | | | | | | | | | | | | | | | | | | | | Summary: * Sections on main page. * Use std algorithm for equality check in example. * Add tree view on left side. * Add extra CSS sheet to restrict content width. * Add mild background color. * Restrict alphabetic indexes to 1 column. * Round corners of content boxes. * Rename example to CUDASaxpy.cpp. * Add CUDASaxpy.cpp to "Examples" section. Reviewers: jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24198 llvm-svn: 280511
* [SE] GlobalDeviceMemory owns its handleJason Henline2016-09-026-143/+123
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Final step in getting GlobalDeviceMemory to own its handle. * Make GlobalDeviceMemory movable, but no longer copyable. * Make Device::freeDeviceMemory function private and make GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase can free its memory in its destructor. * Make GlobalDeviceMemory constructor private and make Device a friend so it can construct GlobalDeviceMemory. * Remove SharedDeviceMemoryBase class because it is never used. * Remove explicit memory freeing from example code. This change just consumes any errors generated during device memory freeing. The real error handling will be added in a future patch. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24195 llvm-svn: 280509
* [SE] Add "install" actions to cmake buildJason Henline2016-09-022-0/+4
| | | | | | | The "install" build target will now copy the StreamExecutor library and headers to the appropriate subdirectories of CMAKE_INSTALL_PREFIX. llvm-svn: 280506
* [SE] Don't pack raw device mem argsJason Henline2016-09-022-116/+40
| | | | | | | | | | | | | | | | | Summary: Step 4 of getting GlobalDeviceMemory to own its handle. Take out code to pack untyped device memory types as kernel arguments. When GlobalDeviceMemory owns its handle, users will never touch untyped device memory types, so they will never pass them as kernel args. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24177 llvm-svn: 280496
* [StreamExecutor] Pass device memory by refJason Henline2016-09-023-31/+34
| | | | | | | | | | | | | | | | Summary: Step 3 of getting GlobalDeviceMemory to own its handle. Since GlobalDeviceMemory will no longer by copy-constructible, we must pass instances by reference rather than by value. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24172 llvm-svn: 280439
* [SE] Make Kernel movableJason Henline2016-09-023-72/+12
| | | | | | | | | | | | | | | Summary: Kernel is basically just a smart pointer to the underlying implementation, so making it movable prevents having to store a std::unique_ptr to it. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24150 llvm-svn: 280437
* [StreamExecutor] Read dev array directly in testJason Henline2016-09-013-63/+97
| | | | | | | | | | | | | | | | | | Summary: Step 2 of getting GlobalDeviceMemory to own its handle. Use the SimpleHostPlatformDevice allocate methods to create device arrays for tests, and check for successful copies by dereferncing the device array handle directly because we know it is really a host pointer. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24148 llvm-svn: 280428
* [StreamExecutor] Dev handles in platform interfaceJason Henline2016-09-016-153/+171
| | | | | | | | | | | | | | | | | Summary: This is the first in a series of patches that will convert GlobalDeviceMemory to own its device memory handle. The first step is to remove GlobalDeviceMemoryBase from the PlatformInterface interfaces and use raw handles there instead. This is useful because GlobalDeviceMemoryBase is going to lose its importance in this process. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24114 llvm-svn: 280401
* [SE] Make Stream movableJason Henline2016-09-015-14/+17
| | | | | | | | | | | | | | Summary: The example code makes it clear that this is a much better design decision. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24142 llvm-svn: 280397
* [SE] Docs use JAVADOC_AUTOBRIEFJason Henline2016-09-011-1/+1
| | | | | | | That way we don't have to explicitly annotate each brief description as \brief. llvm-svn: 280384
* [StreamExecutor] getOrDie and dieIfError utilsJason Henline2016-08-314-37/+52
| | | | | | | | | | Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24107 llvm-svn: 280312
* Exclude examples, unittests from doc genJason Henline2016-08-311-1/+1
| | | | | | | Public documentation shouldn't be generated for unit test code and code that is only meant to be used as snippets in other documentation. llvm-svn: 280278
* [StreamExecutor] Add Doxygen main pageJason Henline2016-08-317-5/+243
| | | | | | | | | | Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24066 llvm-svn: 280277
* [StreamExecutor] Add Stream::blockHostUntilDoneJason Henline2016-08-311-1/+10
| | | | | | | | | | | | Summary: Add the type-safe wrapper to the platform-specific implementation. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24063 llvm-svn: 280182
* [StreamExecutor] Simplify Kernel classesJason Henline2016-08-307-212/+87
| | | | | | | | | | | | | | Summary: Make the Kernel class follow the pattern of the other classes. It now has a type-safe user wrapper and a typeless, platform-specific handle. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24043 llvm-svn: 280176
* [StreamExecutor] Fix KernelSpec DoxygenJason Henline2016-08-261-5/+5
| | | | | | | | | | | | | | | Summary: There was a typo where \endcode was spelled as \encode and it was keeping the whole file document from rendering. I also added in some \c annotations for inline code stuff to make it look nicer. Reviewers: jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23941 llvm-svn: 279855
* [StreamExecutor] Add Platform and PlatformManagerJason Henline2016-08-255-0/+154
| | | | | | | | | | | | Summary: Abstractions for a StreamExecutor platform Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23857 llvm-svn: 279779
* [StreamExecutor] Rename Executor to DeviceJason Henline2016-08-2414-580/+575
| | | | | | | | | | | | Summary: This more clearly describes what the class is. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23851 llvm-svn: 279669
* [StreamExecutor] Fix allocateDeviceMemoryJason Henline2016-08-242-2/+37
| | | | | | | | | | | | | | | | | | | Summary: The return value from PlatformExecutor::allocateDeviceMemory needs to be converted from Expected<GlobalDeviceMemoryBase> to Expected<GlobalDeviceMemory<T>> in Executor::allocateDeviceMemory. A similar bug is also fixed for Executor::allocateHostMemory. Thanks to jprice for identifying this bug. Reviewers: jprice, jlebar Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23849 llvm-svn: 279658
* [StreamExecutor] Clean up device copy commentsJason Henline2016-08-244-159/+2361
| | | | | | | | | | | | | | | | | Summary: Consolidate Executor::synchronousCopy* and Stream::thenCopy* methods into Doxygen method groups and combine all their comments into one section. Also a "doc" target to the build files to use Doxygen to build the documentation. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23845 llvm-svn: 279654
* [StreamExecutor] Executor add synchronous methodsJason Henline2016-08-249-117/+1475
| | | | | | | | | | | | | | Summary: Add Executor methods that block the host until completion. Since these methods are host-synchronous, they don't require Stream arguments. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23577 llvm-svn: 279640
* [StreamExecutor] Rename StreamExecutor to ExecutorJason Henline2016-08-1611-83/+77
| | | | | | | | | | | | Summary: No functional changes just renaming this class for better readability. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23574 llvm-svn: 278833
* [StreamExecutor] Add basic Stream operationsJason Henline2016-08-1616-49/+732
| | | | | | | | | | | | Summary: Add the Stream class and a few of the operations it supports. Reviewers: jlebar, tra Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23333 llvm-svn: 278829
OpenPOWER on IntegriCloud