| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
We can do this now that the linker script and the writer agree on
which sections should be combined.
llvm-svn: 295341
|
|
|
|
|
|
|
|
| |
via setting envirable KMP_INITIAL_THREAD_BIND=1.
Differential Revision: https://reviews.llvm.org/D29665
llvm-svn: 295339
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resubmit -r295314 with PowerPC and AMDGPU tests updated.
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295336
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any teams construct for elementary data types. It builds
on parallel reductions on the GPU. Subsequently,
the team master writes to a unique location in a global
memory scratchpad. The last team to do so loads and
reduces this array to calculate the final result.
This patch emits two helper functions that are used by
the OpenMP runtime on the GPU to perform reductions across
teams.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29879
llvm-svn: 295335
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295333
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SHF_LINK_ORDER sections adds special ordering requirements.
Such sections references other sections. Previously we would crash
if section that other were referenced to was discarded by script.
Patch fixes that by discarding all dependent sections in that case.
It supports chained dependencies, testcase is provided.
Differential revision: https://reviews.llvm.org/D30033
llvm-svn: 295332
|
|
|
|
|
|
|
|
|
|
|
| |
Regression test neon-diagnostics.s needed changing because it now
produces a more specific diagnostic about the immediate ranges. One
change in the expected error message is not obvious, but there multiple
candidate and it happens to pick the immediate diagnostic.
Differential Revision: https://reviews.llvm.org/D29939
llvm-svn: 295331
|
|
|
|
|
|
| |
Fixes a number of tests in the testsuite on Windows.
llvm-svn: 295330
|
|
|
|
|
|
|
|
|
|
| |
On Windows, we were using `Sleep` which is not alertable. This means
that if the thread was used for a user APC or WinProc handling and
thread::sleep was used, we could potentially dead lock. Use `SleepEx`
with an alertable sleep, resuming until the time has expired if we are
awoken early.
llvm-svn: 295329
|
|
|
|
|
|
| |
BuiltinType::Kind::OCLNDRange was removed.
llvm-svn: 295328
|
|
|
|
|
|
| |
for now.
llvm-svn: 295327
|
|
|
|
| |
llvm-svn: 295326
|
|
|
|
|
|
| |
Reviewed as https://reviews.llvm.org/D29780.
llvm-svn: 295325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, the common way of writing linker scripts seems to be
to get the output of ld.bfd --verbose and edit it a bit.
Also unfortunately, the bfd default script contains things like
.rela.dyn : { *(... .rela.data ...) }
but bfd actually ignores that for -emit-relocs, so we have to do the
same.
llvm-svn: 295324
|
|
|
|
| |
llvm-svn: 295323
|
|
|
|
|
|
|
| |
The code to handle the input SHT_REL/SHT_RELA sections was getting
confused with the linker generated relocation sections.
llvm-svn: 295322
|
|
|
|
| |
llvm-svn: 295321
|
|
|
|
|
|
|
|
|
| |
ExprConstant.cpp:6344:20: warning: comparison of integers of different
signs: 'const size_t' (aka 'const unsigned long') and 'typename
iterator_traits<Expr *const *>::difference_type' (aka 'long')
[-Wsign-compare]
llvm-svn: 295320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295319
|
|
|
|
|
|
|
|
| |
In D28836, we added a way to tag heap objects and thus provide object types into report. This patch exposes this information into the debugging API.
Differential Revision: https://reviews.llvm.org/D30023
llvm-svn: 295318
|
|
|
|
| |
llvm-svn: 295317
|
|
|
|
|
|
|
|
| |
load combine"
This change causes some of AMDGPU and PowerPC tests to fail.
llvm-svn: 295316
|
|
|
|
|
|
|
|
|
|
| |
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295314
|
|
|
|
|
|
|
|
|
| |
Added description of a new feature that allows to specify
vendor extension in flexible way using compiler pragma instead
of modifying source code directly (committed in clang@r289979).
Review: D29829
llvm-svn: 295313
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements block comment decoration alignment.
source:
```
/* line 1
* line 2
*/
```
result before:
```
/* line 1
* line 2
*/
```
result after:
```
/* line 1
* line 2
*/
```
Reviewers: djasper, bkramer, klimek
Reviewed By: klimek
Subscribers: mprobst, cfe-commits, klimek
Differential Revision: https://reviews.llvm.org/D29943
llvm-svn: 295312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed ndrange_t as Clang builtin type and added
as a struct type in the OpenCL header.
Use type name to do the Sema checking in enqueue_kernel
and modify IR generation accordingly.
Review: D28058
Patch by Dmitry Borisenkov!
llvm-svn: 295311
|
|
|
|
|
|
|
|
| |
Since they're only used for passing around double precision floating point
values into the general purpose registers, we'll lower them to VMOVDRR and
VMOVRRD.
llvm-svn: 295310
|
|
|
|
|
|
| |
Just use VADDD if available, bail out if not.
llvm-svn: 295309
|
|
|
|
| |
llvm-svn: 295308
|
|
|
|
| |
llvm-svn: 295307
|
|
|
|
|
|
|
| |
Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating
point values in the soft-fp float mode.
llvm-svn: 295306
|
|
|
|
| |
llvm-svn: 295305
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds onTypeFormatting to clangd.
The trigger character is '}' and it works by scanning for the matching '{' and formatting the range in-between.
There are problems with ';' as a trigger character, the cursor position is before the `|`:
```
int main() {
int i;|
}
```
becomes:
```
int main() { int i;| }
```
which is not likely what the user intended.
Also formatting at semicolon in a non-properly closed scope puts the following tokens in the same unwrapped line, which doesn't reformat nicely.
Reviewers: bkramer
Reviewed By: bkramer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D29990
llvm-svn: 295304
|
|
|
|
| |
llvm-svn: 295303
|
|
|
|
|
|
|
| |
Also add mappings for single and double precision FP, and use them for G_FADD
and G_LOAD.
llvm-svn: 295302
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modules/preambles/PCH files can contain diagnostics, which, when used,
are added to the current ASTUnit. For that to work, they are translated
to use the current FileManager's FileIDs. When the entry is not the
main file, all local source locations will be checked by a linear
search. Now this is a problem, when there are lots of diagnostics (say,
25000) and lots of local source locations (say, 440000), and end up
taking seconds when using such a preamble.
The fix is to cache the last FileID, because many subsequent diagnostics
refer to the same file. This reduces the time spent in
ASTUnit::TranslateStoredDiagnostics from seconds to a few milliseconds
for files with many slocs/diagnostics.
This fixes PR31353.
Differential Revision: https://reviews.llvm.org/D29755
llvm-svn: 295301
|
|
|
|
|
|
|
|
| |
For now we just mark them as legal all the time and let the other passes bail
out if they can't handle it. In the future, we'll want to move more of the
brains into the legalizer.
llvm-svn: 295300
|
|
|
|
| |
llvm-svn: 295299
|
|
|
|
|
|
|
|
|
|
|
| |
That fixes a case when section has more than one metadata
section. Previously GC would collect one of such sections
because we had implementation that stored only last one as
dependent.
Differential revision: https://reviews.llvm.org/D29981
llvm-svn: 295298
|
|
|
|
| |
llvm-svn: 295297
|
|
|
|
|
|
| |
It's needed if libcxx is build without disabling threads.
llvm-svn: 295296
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the hard float calling convention, we just use the D registers.
For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.
For pure soft float targets, we still bail out because we don't support the
libcalls yet.
llvm-svn: 295295
|
|
|
|
|
|
| |
intrinsics like it does 128/256-bit.
llvm-svn: 295294
|
|
|
|
|
|
| |
Fix modules build bot.
llvm-svn: 295293
|
|
|
|
| |
llvm-svn: 295292
|
|
|
|
|
|
|
|
| |
unmasked builtins.
These new unmasked builtins will enable us to easily support optimizing these builtins in InstCombine in the backend.
llvm-svn: 295291
|
|
|
|
|
|
|
|
| |
intrinsics with select instructions. For 512-bit add new unmasked intrinsics.
The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO.
llvm-svn: 295290
|
|
|
|
| |
llvm-svn: 295289
|
|
|
|
| |
llvm-svn: 295288
|
|
|
|
|
|
|
|
|
| |
This patch removes NeedsCopyOrPltAddr and instead add two variables,
NeedsCopy and NeedsPltAddr. This uses one more bit in Symbol class,
but the actual size doesn't increase because we had unused bits.
This should improve code readability.
llvm-svn: 295287
|