summaryrefslogtreecommitdiffstats
path: root/polly/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [CodeGeneration] Do not set insert position redundantlyTobias Grosser2016-08-081-1/+0
| | | | | | | | There is no need to reset the position of the builder, as we can just continue to insert code at the current position of the IRBuilder, which happens to be precisely the location we reset the builder to. llvm-svn: 278014
* [IslNodeBuilder] Directly use the insert location of our BuilderTobias Grosser2016-08-081-1/+14
| | | | | | | | | ... instead of adding instructions at the end of the basic block the builder is currently at. This makes it easier to reason about where IR is generated, as with the IRBuilder there is just a single location that specificies where IR is generated. llvm-svn: 278013
* [CodeGen] Use MapVector instead of DenseMap.Michael Kruse2016-08-051-2/+2
| | | | | | | | | | | The map is iterated over when generating the values escaping the SCoP. The indeterministic iteration order of DenseMap causes the output IR to change at every compilation, adding noise to comparisons. Replace DenseMap by a MapVector to ensure the same iteration order at every compilation. llvm-svn: 277832
* GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctlyTobias Grosser2016-08-051-1/+7
| | | | | | | | | | Before this commit we generated the array type in reverse order and we also added the outermost dimension size to the new array declaration, which is incorrect as Polly additionally assumed an additional unsized outermost dimension, such that we had an off-by-one error in the linearization of access expressions. llvm-svn: 277802
* GPGPU: Add cuda annotations to specify maximal number of threads per blockTobias Grosser2016-08-051-3/+40
| | | | | | | | These annotations ensure that the NVIDIA PTX assembler limits the number of registers used such that we can be certain the resulting kernel can be executed for the number of threads in a thread block that we are planning to use. llvm-svn: 277799
* GPGPU: Support scalars that are mapped to shared memoryTobias Grosser2016-08-042-10/+11
| | | | llvm-svn: 277726
* GPGPU: Disable verbose debug outputTobias Grosser2016-08-041-0/+1
| | | | llvm-svn: 277724
* Remove leftover debug outputTobias Grosser2016-08-041-1/+0
| | | | llvm-svn: 277723
* GPGPU: Add private memory supportTobias Grosser2016-08-041-14/+25
| | | | llvm-svn: 277722
* GPGPU: Add support for shared memoryTobias Grosser2016-08-041-5/+90
| | | | llvm-svn: 277721
* GPGPU: Handle scalar array referencesTobias Grosser2016-08-042-13/+38
| | | | | | | Pass the content of scalar array references to the alloca on the kernel side and do not pass them additional as normal LLVM scalar value. llvm-svn: 277699
* BlockGenerator: Assert that we do not get alloca of array accessTobias Grosser2016-08-041-0/+4
| | | | llvm-svn: 277698
* GPGPU: Pass subtree values correctly to the kernelTobias Grosser2016-08-041-6/+22
| | | | llvm-svn: 277697
* GPGPU: Mark kernel functions as polly.skipTobias Grosser2016-08-031-0/+3
| | | | | | | | | Otherwise, we would try to re-optimize them with Polly-ACC and possibly even generate kernels that try to offload themselves, which does not work as the GPURuntime is not available on the accelerator and also does not make any sense. llvm-svn: 277589
* Fix a couple of spelling mistakesTobias Grosser2016-08-032-4/+4
| | | | llvm-svn: 277569
* Extend the jscop interface to allow the user to declare new arrays and to ↵Roman Gareev2016-07-304-13/+34
| | | | | | | | | | | | | | | | reference these arrays from access expressions Extend the jscop interface to allow the user to export arrays. It is required that already existing arrays of the list of arrays correspond to arrays of the SCoP. Each array that is appended to the list will be newly created. Furthermore, we allow the user to modify access expressions to reference any array in case it has the same element type. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D22828 llvm-svn: 277263
* GPGPU: Pass context parameters to GPU kernelTobias Grosser2016-07-281-0/+18
| | | | llvm-svn: 276963
* GPGPU: Pass host iterators to kernelTobias Grosser2016-07-281-0/+18
| | | | llvm-svn: 276962
* GPGPU: use current 'Index' to find slot in parameter arrayTobias Grosser2016-07-281-2/+2
| | | | | | | | Before this change we used the array index, which would result in us accessing the parameter array out-of-bounds. This bug was visible for test cases where not all arrays in a scop are passed to a given kernel. llvm-svn: 276961
* GPGPU: Generate kernel parameter allocation with right sizeTobias Grosser2016-07-281-1/+2
| | | | | | Before this change we miscounted the number of function parameters. llvm-svn: 276960
* GPGPU: Add basic support for kernel launchesTobias Grosser2016-07-271-0/+171
| | | | llvm-svn: 276863
* GPGPU: Load GPU kernelsTobias Grosser2016-07-251-3/+60
| | | | | | | We embed the PTX code into the host IR as a global variable and compile it at run-time into a GPU kernel. llvm-svn: 276645
* GPGPU: Emit data-transfer codeTobias Grosser2016-07-251-25/+139
| | | | | | | Also factor out getArraySize() to avoid code dupliciation and reorder some function arguments to indicate the direction into which data is transferred. llvm-svn: 276636
* GPGPU: Complete code to allocate and free device arraysTobias Grosser2016-07-251-4/+45
| | | | | | | At the beginning of each SCoP, we allocate device arrays for all arrays used on the GPU and we free such arrays after the SCoP has been executed. llvm-svn: 276635
* GPGPU: initialize GPU context and simplify the corresponding GPURuntime ↵Tobias Grosser2016-07-251-0/+116
| | | | | | | | | interface. There is no need to expose the selected device at the moment. We also pass back pointers as return values, as this simplifies the interface. llvm-svn: 276623
* IslNodeBuilder: Make finalize() virtualTobias Grosser2016-07-252-3/+2
| | | | | | | | This allows the finalization routine of the IslNodeBuilder to be overwritten by derived classes. Being here, we also drop the unnecessary 'Scop' postfix and the unnecessary 'Scop' parameter. llvm-svn: 276622
* GPGPU: Optimize kernel IR before generating assembly codeTobias Grosser2016-07-241-0/+9
| | | | | | | | We optimize the kernel _after_ dumping the IR we generate to make the IR we dump easier readable and independent of possible changes in the general purpose LLVM optimizers. llvm-svn: 276551
* GPGPU: Verify kernel IR before generating assemblyTobias Grosser2016-07-241-0/+5
| | | | llvm-svn: 276550
* GPGPU: Generate PTX assembly code for the kernel modulesTobias Grosser2016-07-221-0/+123
| | | | | | | | | | | | | | | | | Run the NVPTX backend over the GPUModule IR and write the resulting assembly code in a string. To work correctly, it is important to invalidate analysis results that still reference the IR in the kernel module. Hence, this change clears all references to dominators, loop info, and scalar evolution. Finally, the NVPTX backend has troubles to generate code for various special floating point types (not surprising), but also for uncommon integer types. This commit does not resolve these issues, but pulls out problematic test cases into separate files to XFAIL them individually and resolve them in future (not immediate) changes one by one. llvm-svn: 276396
* GPGPU: generate code for ScopStatementsTobias Grosser2016-07-211-15/+202
| | | | | | | | | | | | | | | This change introduces the actual compute code in the GPU kernels. To ensure all values referenced from the statements in the GPU kernel are indeed available we scan all ScopStmts in the GPU kernel for references to llvm::Values that are not yet covered by already modeled outer loop iterators, parameters, or array base pointers and also pass these additional llvm::Values to the GPU kernel. For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which is referenced by the newly generated access functions within the GPU kernel and which is used to help with code generation. llvm-svn: 276270
* IslNodeBuilder: expose addReferencesFromStmt [NFC]Tobias Grosser2016-07-211-11/+1
| | | | | | | This will be used by Polly GPGPU to determine the values that need to be passed to GPU kernels. llvm-svn: 276269
* IslExprBuilder: allow to specify an external isl_id to ScopArrayInfo mappingTobias Grosser2016-07-211-1/+12
| | | | | | | | | This is useful for external users using IslExprBuilder, in case they cannot embed ScopArrayInfo data into their isl_ids, because the isl_ids either already carry other information or the isl_ids have been created and their user pointers cannot be updated any more. llvm-svn: 276268
* BlockGenerator: remove dead instructions in normal statementsTobias Grosser2016-07-211-0/+22
| | | | | | | | | | | | | This ensures that no trivially dead code is generated. This is not only cleaner, but also avoids troubles in case code is generated in a separate function and some of this dead code contains references to values that are not available. This issue may happen, in case the memory access functions have been updated and old getelementptr instructions remain in the code. With normal Polly, a test case is difficult to draft, but the upcoming GPU code generation can possibly trigger such problems. We will later extend this dead-code elimination to region and vector statements. llvm-svn: 276263
* GPGPU: Bail out of scops with hoisted invariant loadsTobias Grosser2016-07-191-0/+4
| | | | | | | This is currently not supported and will only be added later. Also update the test cases to ensure no invariant code hoisting is applied. llvm-svn: 275987
* GPGPU: Emit in-kernel synchronization statementsTobias Grosser2016-07-191-0/+49
| | | | | | | We use this opportunity to further classify the different user statements that can arise and add TODOs for the ones not yet implemented. llvm-svn: 275957
* GPGPU: generate control flow within the kernelTobias Grosser2016-07-191-0/+6
| | | | llvm-svn: 275956
* GPGPU: add scop parameters to kernel argumentsTobias Grosser2016-07-191-1/+14
| | | | llvm-svn: 275955
* GPGPU: add host iterators to kernel argumentsTobias Grosser2016-07-191-1/+14
| | | | llvm-svn: 275954
* GPGPU: add intrinsic functions to obtain a kernels thread and block idsTobias Grosser2016-07-191-0/+50
| | | | llvm-svn: 275953
* GPGPU: create kernel function skeletonTobias Grosser2016-07-191-7/+153
| | | | | | | | | Create for each kernel a separate LLVM-IR module containing a single function marked as kernel function and taking one pointer for each array referenced by this kernel. Add debugging output to verify the kernels are generated correctly. llvm-svn: 275952
* GPGPU: collect array referencesTobias Grosser2016-07-181-0/+2
| | | | | | | | | | | Initialize the list of references to a GPU array to ensure that the arrays that need to be passed to kernel calls are computed correctly. Furthermore, the very same information is also necessary to compute synchronization correctly. As the functionality to compute these references is already available, what is left for us to do is only to connect the necessary functionality to compute array reference information. llvm-svn: 275798
* GPGPU: Pull implementation out of class definitionTobias Grosser2016-07-181-4/+7
| | | | | | | This will allow us to see the full class definition even after we add non-trivial implementations of the different member functions. llvm-svn: 275797
* GPGPU: Create host control flowTobias Grosser2016-07-181-0/+82
| | | | | | | | | | | | | | Create LLVM-IR for all host-side control flow of a given GPU AST. We implement this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder. The IslNodeBuilder will take care of generating all general-purpose ast nodes, but we provide our own createUser implementation to handle the different GPU specific user statements. For now, we just skip any user statement and only generate a host-code sceleton, but in subsequent commits we will add handling of normal ScopStmt's performing computations, kernel calls, as well as host-device data transfers. We will also introduce run-time check generation and LICM in subsequent commits. llvm-svn: 275783
* GPGPU: Format statements scheduled on the host ourselvesTobias Grosser2016-07-151-0/+14
| | | | | | | | Otherwise ppcg would try to call into pet functionality that this not available, which obviously will cause trouble. As we can easily print these statements ourselves, we just do so. llvm-svn: 275579
* GPGPU: Use schedule whole components for schedulerTobias Grosser2016-07-151-9/+1
| | | | | | | | | | | This option increases the scalability of the scheduler and allows us to remove the 'gisting' workaround we introduced in r275565 to handle a more complicated test case. Another benefit of using this option is also that the generated code looks a lot more streamlined. Thanks to Sven Verdoolaege for reminding me of this option. llvm-svn: 275573
* GPGPU: Drop domain constraints from flow dependencesTobias Grosser2016-07-151-0/+9
| | | | | | | | This works around a shortcoming of the isl scheduler, which even for some smaller test cases does not terminate in case domain constraints are part of the flow dependences. llvm-svn: 275565
* GPGPU: Add memory reference tag ids to tagged accessesTobias Grosser2016-07-151-0/+2
| | | | | | | | | | It seems we forgot to actually add the memory access ids to the tagged accesses, but instead just tagged the accesses with empty isl_ids. This issue was found by inspection and without code generation it is difficult to test just by itself. We fix it for now without test case and expect our code generation tests to cover this later on. llvm-svn: 275557
* GPGPU: Make sure scops with more than one array workTobias Grosser2016-07-151-0/+1
| | | | | | We use this opportunity to add a test case containing a scalar parameter. llvm-svn: 275547
* GPGPU: Free options to avoid memory leakTobias Grosser2016-07-151-0/+14
| | | | | | | ppcg does not free the option structs for us. To avoid a memory leak we do this ourselves. llvm-svn: 275546
* GPGPU: Shorten ppcg include paths to avoid conflict with cuda.hTobias Grosser2016-07-151-5/+5
| | | | | | | | | | Instead of directly linking to ppcg's main source directory, we link to the parent director. This allows us to access ppcg's include files with 'ppcg/cuda.h' and avoids a conflict with NVIDIA's cuda.h header. Also drop an include directory that is currently not used. llvm-svn: 275536
OpenPOWER on IntegriCloud