| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 295372
|
| |
|
|
|
|
| |
All the cool targets are doing it...
llvm-svn: 295371
|
| |
|
|
|
|
|
|
|
|
|
| |
TimerGroup was showing up on a leak in valigrind, and
used some pretty complex code to implement a singleton.
This patch replaces the implementation with a vastly simpler
one.
Differential Revision: https://reviews.llvm.org/D28367
llvm-svn: 295370
|
| |
|
|
| |
llvm-svn: 295369
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29964
llvm-svn: 295368
|
| |
|
|
| |
llvm-svn: 295366
|
| |
|
|
|
|
|
| |
I cannot reproduce the issue locally, but for some reason some bots
want to instantiate this from the header.
llvm-svn: 295365
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We found a nondeterministic behavior when doing online profile merging
for multi-process applications. The application forks a sub-process and
sub-process sets to get SIGKILL when the parent process exits,
The first process gets the lock, and dumps the profile. The second one
will mmap the file, do the merge and write out the file. Note that before
the merged write, we truncate the profile.
Depending on the timing, the child process might be terminated
abnormally when the parent exits first. If this happens:
(1) before the truncation, we will get the profile for the main process
(2) after the truncation, and before write-out the profile, we will get
0 size profile.
(3) after the merged write, we get merged profile.
This patch temporarily suspend the SIGKILL for PR_SET_PDEATHSIG
before profile-write and restore it after the write.
This patch only applies to Linux system.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: xur, llvm-commits
Differential Revision: https://reviews.llvm.org/D29954
llvm-svn: 295364
|
| |
|
|
| |
llvm-svn: 295363
|
| |
|
|
|
|
|
| |
There is only a single disjunction. However, we bound the number of 'disjuncts'
in this disjunction. Name the variable accordingly.
llvm-svn: 295362
|
| |
|
|
| |
llvm-svn: 295361
|
| |
|
|
|
|
|
|
|
| |
Before this change wrapping range metadata resulted in exponential growth of
the context, which made context construction of large scops very slow. Instead,
we now just do not model the range information precisely, in case the number
of disjuncts in the context has already reached a certain limit.
llvm-svn: 295360
|
| |
|
|
| |
llvm-svn: 295359
|
| |
|
|
| |
llvm-svn: 295358
|
| |
|
|
|
|
|
|
|
|
| |
functions (PR26302)"
The original commit was reverted in r283329 due to a miscompile in
Chromium. That turned out to be the same issue as PR31257, which was
fixed in r295262.
llvm-svn: 295357
|
| |
|
|
|
|
|
|
|
|
| |
Defining nodes should not alias with one another, while clobbering
nodes can. When pushing defs on stacks, push clobbers first, link
non-clobbering defs, then push the defs.
The data flow in a statement is now: uses -> clobbers -> defs.
llvm-svn: 295356
|
| |
|
|
| |
llvm-svn: 295355
|
| |
|
|
| |
llvm-svn: 295354
|
| |
|
|
| |
llvm-svn: 295353
|
| |
|
|
|
|
| |
error: this ‘else’ clause does not guard... [-Werror=misleading-indentation]
llvm-svn: 295352
|
| |
|
|
|
|
| |
Remove the duplicate from DFG and make some members of PRI private.
llvm-svn: 295351
|
| |
|
|
| |
llvm-svn: 295350
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit r230230 introduced the use of range metadata to derive bounds for
parameters, instead of just looking at the type of the parameter. As part of
this commit support for wrapping ranges was added, where the lower bound of a
parameter is larger than the upper bound:
{ 255 < p || p < 0 }
However, at the same time, for wrapping ranges support for adding bounds given
by the size of the containing type has acidentally been dropped. As a result,
the range of the parameters was not guaranteed to be bounded any more. This
change makes sure we always add the bounds given by the size of the type and
then additionally add bounds based on signed wrapping, if available. For a
parameter p with a type size of 32 bit, the valid range is then:
{ -2147483648 <= p <= 2147483647 and (255 < p or p < 0) }
llvm-svn: 295349
|
| |
|
|
|
|
| |
can be accessed after the static destroyed.
llvm-svn: 295348
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
supports SSE
The existing code always saves the xmm registers for 64-bit targets even if the
target doesn't support SSE (which is common for kernels). Thus, the compiler
inserts movaps instructions which lead to CPU exceptions when an interrupt
handler is invoked.
This commit fixes this bug by returning a register set without xmm registers
from getCalleeSavedRegs and getCallPreservedMask for such targets.
Patch by Philipp Oppermann.
Differential Revision: https://reviews.llvm.org/D29959
llvm-svn: 295347
|
| |
|
|
| |
llvm-svn: 295346
|
| |
|
|
|
|
|
|
|
|
|
|
| |
While refactoring the code in r293046 I made a very basic error -
relying on destructor side-effects of a copyable object. Fix that and
make the object non-copyable.
This fixes the tests on the platforms that need this workaround, but
unfortunately we don't have a way to make a more platform-agnostic test
right now.
llvm-svn: 295345
|
| |
|
|
|
|
|
|
|
| |
Added test kmp_task_reduction_nest.cpp which has an example of
possible compiler codegen.
Differential Revision: https://reviews.llvm.org/D29600
llvm-svn: 295343
|
| |
|
|
|
|
|
|
| |
TSan now has the ability to report races on "external" object, i.e. any library class/object that has read-shared write-exclusive threading semantics. The detection and reporting work almost out of the box, but TSan can now provide the type of the object (as a string). This patch implements this into LLDB.
Differential Revision: https://reviews.llvm.org/D30024
llvm-svn: 295342
|
| |
|
|
|
|
|
| |
We can do this now that the linker script and the writer agree on
which sections should be combined.
llvm-svn: 295341
|
| |
|
|
|
|
|
|
| |
via setting envirable KMP_INITIAL_THREAD_BIND=1.
Differential Revision: https://reviews.llvm.org/D29665
llvm-svn: 295339
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Resubmit -r295314 with PowerPC and AMDGPU tests updated.
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295336
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any teams construct for elementary data types. It builds
on parallel reductions on the GPU. Subsequently,
the team master writes to a unique location in a global
memory scratchpad. The last team to do so loads and
reduces this array to calculate the final result.
This patch emits two helper functions that are used by
the OpenMP runtime on the GPU to perform reductions across
teams.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29879
llvm-svn: 295335
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295333
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
SHF_LINK_ORDER sections adds special ordering requirements.
Such sections references other sections. Previously we would crash
if section that other were referenced to was discarded by script.
Patch fixes that by discarding all dependent sections in that case.
It supports chained dependencies, testcase is provided.
Differential revision: https://reviews.llvm.org/D30033
llvm-svn: 295332
|
| |
|
|
|
|
|
|
|
|
|
| |
Regression test neon-diagnostics.s needed changing because it now
produces a more specific diagnostic about the immediate ranges. One
change in the expected error message is not obvious, but there multiple
candidate and it happens to pick the immediate diagnostic.
Differential Revision: https://reviews.llvm.org/D29939
llvm-svn: 295331
|
| |
|
|
|
|
| |
Fixes a number of tests in the testsuite on Windows.
llvm-svn: 295330
|
| |
|
|
|
|
|
|
|
|
| |
On Windows, we were using `Sleep` which is not alertable. This means
that if the thread was used for a user APC or WinProc handling and
thread::sleep was used, we could potentially dead lock. Use `SleepEx`
with an alertable sleep, resuming until the time has expired if we are
awoken early.
llvm-svn: 295329
|
| |
|
|
|
|
| |
BuiltinType::Kind::OCLNDRange was removed.
llvm-svn: 295328
|
| |
|
|
|
|
| |
for now.
llvm-svn: 295327
|
| |
|
|
| |
llvm-svn: 295326
|
| |
|
|
|
|
| |
Reviewed as https://reviews.llvm.org/D29780.
llvm-svn: 295325
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, the common way of writing linker scripts seems to be
to get the output of ld.bfd --verbose and edit it a bit.
Also unfortunately, the bfd default script contains things like
.rela.dyn : { *(... .rela.data ...) }
but bfd actually ignores that for -emit-relocs, so we have to do the
same.
llvm-svn: 295324
|
| |
|
|
| |
llvm-svn: 295323
|
| |
|
|
|
|
|
| |
The code to handle the input SHT_REL/SHT_RELA sections was getting
confused with the linker generated relocation sections.
llvm-svn: 295322
|
| |
|
|
| |
llvm-svn: 295321
|
| |
|
|
|
|
|
|
|
| |
ExprConstant.cpp:6344:20: warning: comparison of integers of different
signs: 'const size_t' (aka 'const unsigned long') and 'typename
iterator_traits<Expr *const *>::difference_type' (aka 'long')
[-Wsign-compare]
llvm-svn: 295320
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295319
|
| |
|
|
|
|
|
|
| |
In D28836, we added a way to tag heap objects and thus provide object types into report. This patch exposes this information into the debugging API.
Differential Revision: https://reviews.llvm.org/D30023
llvm-svn: 295318
|
| |
|
|
| |
llvm-svn: 295317
|