| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
There is no need to reset the position of the builder, as we can just continue
to insert code at the current position of the IRBuilder, which happens to
be precisely the location we reset the builder to.
llvm-svn: 278014
|
|
|
|
|
|
|
|
|
| |
... instead of adding instructions at the end of the basic block the builder
is currently at. This makes it easier to reason about where IR is generated,
as with the IRBuilder there is just a single location that specificies where
IR is generated.
llvm-svn: 278013
|
|
|
|
|
|
|
|
|
|
|
| |
The map is iterated over when generating the values escaping the SCoP. The
indeterministic iteration order of DenseMap causes the output IR to change at
every compilation, adding noise to comparisons.
Replace DenseMap by a MapVector to ensure the same iteration order at every
compilation.
llvm-svn: 277832
|
|
|
|
|
|
|
|
|
|
| |
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.
llvm-svn: 277802
|
|
|
|
|
|
|
|
| |
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.
llvm-svn: 277799
|
|
|
|
| |
llvm-svn: 277726
|
|
|
|
| |
llvm-svn: 277724
|
|
|
|
| |
llvm-svn: 277723
|
|
|
|
| |
llvm-svn: 277722
|
|
|
|
| |
llvm-svn: 277721
|
|
|
|
|
|
|
| |
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.
llvm-svn: 277699
|
|
|
|
| |
llvm-svn: 277698
|
|
|
|
| |
llvm-svn: 277697
|
|
|
|
|
|
|
|
|
| |
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.
llvm-svn: 277589
|
|
|
|
| |
llvm-svn: 277569
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reference these arrays from access expressions
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D22828
llvm-svn: 277263
|
|
|
|
| |
llvm-svn: 276963
|
|
|
|
| |
llvm-svn: 276962
|
|
|
|
|
|
|
|
| |
Before this change we used the array index, which would result in us accessing
the parameter array out-of-bounds. This bug was visible for test cases where not
all arrays in a scop are passed to a given kernel.
llvm-svn: 276961
|
|
|
|
|
|
| |
Before this change we miscounted the number of function parameters.
llvm-svn: 276960
|
|
|
|
| |
llvm-svn: 276863
|
|
|
|
|
|
|
| |
We embed the PTX code into the host IR as a global variable and compile it
at run-time into a GPU kernel.
llvm-svn: 276645
|
|
|
|
|
|
|
| |
Also factor out getArraySize() to avoid code dupliciation and reorder some
function arguments to indicate the direction into which data is transferred.
llvm-svn: 276636
|
|
|
|
|
|
|
| |
At the beginning of each SCoP, we allocate device arrays for all arrays
used on the GPU and we free such arrays after the SCoP has been executed.
llvm-svn: 276635
|
|
|
|
|
|
|
|
|
| |
interface.
There is no need to expose the selected device at the moment. We also pass back
pointers as return values, as this simplifies the interface.
llvm-svn: 276623
|
|
|
|
|
|
|
|
| |
This allows the finalization routine of the IslNodeBuilder to be overwritten
by derived classes. Being here, we also drop the unnecessary 'Scop' postfix
and the unnecessary 'Scop' parameter.
llvm-svn: 276622
|
|
|
|
|
|
|
|
| |
We optimize the kernel _after_ dumping the IR we generate to make the IR we
dump easier readable and independent of possible changes in the general
purpose LLVM optimizers.
llvm-svn: 276551
|
|
|
|
| |
llvm-svn: 276550
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Run the NVPTX backend over the GPUModule IR and write the resulting assembly
code in a string.
To work correctly, it is important to invalidate analysis results that still
reference the IR in the kernel module. Hence, this change clears all references
to dominators, loop info, and scalar evolution.
Finally, the NVPTX backend has troubles to generate code for various special
floating point types (not surprising), but also for uncommon integer types. This
commit does not resolve these issues, but pulls out problematic test cases into
separate files to XFAIL them individually and resolve them in future (not
immediate) changes one by one.
llvm-svn: 276396
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces the actual compute code in the GPU kernels. To ensure
all values referenced from the statements in the GPU kernel are indeed available
we scan all ScopStmts in the GPU kernel for references to llvm::Values that
are not yet covered by already modeled outer loop iterators, parameters, or
array base pointers and also pass these additional llvm::Values to the
GPU kernel.
For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which
is referenced by the newly generated access functions within the GPU kernel and
which is used to help with code generation.
llvm-svn: 276270
|
|
|
|
|
|
|
| |
This will be used by Polly GPGPU to determine the values that need to be
passed to GPU kernels.
llvm-svn: 276269
|
|
|
|
|
|
|
|
|
| |
This is useful for external users using IslExprBuilder, in case they cannot
embed ScopArrayInfo data into their isl_ids, because the isl_ids either already
carry other information or the isl_ids have been created and their user pointers
cannot be updated any more.
llvm-svn: 276268
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.
llvm-svn: 276263
|
|
|
|
|
|
|
| |
This is currently not supported and will only be added later. Also update the
test cases to ensure no invariant code hoisting is applied.
llvm-svn: 275987
|
|
|
|
|
|
|
| |
We use this opportunity to further classify the different user statements that
can arise and add TODOs for the ones not yet implemented.
llvm-svn: 275957
|
|
|
|
| |
llvm-svn: 275956
|
|
|
|
| |
llvm-svn: 275955
|
|
|
|
| |
llvm-svn: 275954
|
|
|
|
| |
llvm-svn: 275953
|
|
|
|
|
|
|
|
|
| |
Create for each kernel a separate LLVM-IR module containing a single function
marked as kernel function and taking one pointer for each array referenced
by this kernel. Add debugging output to verify the kernels are generated
correctly.
llvm-svn: 275952
|
|
|
|
|
|
|
|
|
|
|
| |
Initialize the list of references to a GPU array to ensure that the arrays that
need to be passed to kernel calls are computed correctly. Furthermore, the very
same information is also necessary to compute synchronization correctly. As the
functionality to compute these references is already available, what is left for
us to do is only to connect the necessary functionality to compute array
reference information.
llvm-svn: 275798
|
|
|
|
|
|
|
| |
This will allow us to see the full class definition even after we add
non-trivial implementations of the different member functions.
llvm-svn: 275797
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create LLVM-IR for all host-side control flow of a given GPU AST. We implement
this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder. The
IslNodeBuilder will take care of generating all general-purpose ast nodes, but
we provide our own createUser implementation to handle the different GPU
specific user statements. For now, we just skip any user statement and only
generate a host-code sceleton, but in subsequent commits we will add handling of
normal ScopStmt's performing computations, kernel calls, as well as host-device
data transfers. We will also introduce run-time check generation and LICM in
subsequent commits.
llvm-svn: 275783
|
|
|
|
|
|
|
|
| |
Otherwise ppcg would try to call into pet functionality that this not available,
which obviously will cause trouble. As we can easily print these statements
ourselves, we just do so.
llvm-svn: 275579
|
|
|
|
|
|
|
|
|
|
|
| |
This option increases the scalability of the scheduler and allows us to remove
the 'gisting' workaround we introduced in r275565 to handle a more complicated
test case. Another benefit of using this option is also that the generated
code looks a lot more streamlined.
Thanks to Sven Verdoolaege for reminding me of this option.
llvm-svn: 275573
|
|
|
|
|
|
|
|
| |
This works around a shortcoming of the isl scheduler, which even for some
smaller test cases does not terminate in case domain constraints are part
of the flow dependences.
llvm-svn: 275565
|
|
|
|
|
|
|
|
|
|
| |
It seems we forgot to actually add the memory access ids to the tagged accesses,
but instead just tagged the accesses with empty isl_ids. This issue was found
by inspection and without code generation it is difficult to test just by
itself. We fix it for now without test case and expect our code generation
tests to cover this later on.
llvm-svn: 275557
|
|
|
|
|
|
| |
We use this opportunity to add a test case containing a scalar parameter.
llvm-svn: 275547
|
|
|
|
|
|
|
| |
ppcg does not free the option structs for us. To avoid a memory leak we do this
ourselves.
llvm-svn: 275546
|
|
|
|
|
|
|
|
|
|
| |
Instead of directly linking to ppcg's main source directory, we link to the
parent director. This allows us to access ppcg's include files with
'ppcg/cuda.h' and avoids a conflict with NVIDIA's cuda.h header.
Also drop an include directory that is currently not used.
llvm-svn: 275536
|