| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Like r367463, but for xray.
llvm-svn: 367546
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change builds upon D54989, which removes memory allocation from the
critical path of the profiling implementation. This also changes the API
for the profile collection service, to take ownership of the memory and
associated data structures per-thread.
The consolidation of the memory allocation allows us to do two things:
- Limits the amount of memory used by the profiling implementation,
associating preallocated buffers instead of allocating memory
on-demand.
- Consolidate the memory initialisation and cleanup by relying on the
buffer queue's reference counting implementation.
We find a number of places which also display some problematic
behaviour, including:
- Off-by-factor bug in the allocator implementation.
- Unrolling semantics in cases of "memory exhausted" situations, when
managing the state of the function call trie.
We also add a few test cases which verify our understanding of the
behaviour of the system, with important edge-cases (especially for
memory-exhausted cases) in the segmented array and profile collector
unit tests.
Depends on D54989.
Reviewers: mboerger
Subscribers: dschuff, mgorny, dmgreen, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D55249
llvm-svn: 348568
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r348455, with some additional changes:
- Work-around deficiency of gcc-4.8 by duplicating the implementation of
`AppendEmplace` in `Append`, but instead of using brace-init for the
copy construction, use a placement new explicitly calling the copy
constructor.
llvm-svn: 348563
|
|
|
|
|
|
|
| |
This reverts commits r348438, r348445, and r348449 due to breakages with
gcc-4.8 builds.
llvm-svn: 348455
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuation of D54989.
Additional changes:
- Use `.AppendEmplace(...)` instead of `.Append(Type{...})` to appease
GCC 4.8 with confusion on when an initializer_list is used as
opposed to a temporary aggregate initialized object.
llvm-svn: 348438
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
.. and also the follow-ups r348336 r348338.
It broke stand-alone compiler-rt builds with GCC 4.8:
In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’
/work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’
new (AlignedOffset) T{std::forward<Args>(args)...};
^
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from
‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]
’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’
> Summary:
> This change makes the allocator and function call trie implementations
> move-aware and remove the FunctionCallTrie's reliance on a
> heap-allocated set of allocators.
>
> The change makes it possible to always have storage associated with
> Allocator instances, not necessarily having heap-allocated memory
> obtainable from these allocator instances. We also use thread-local
> uninitialised storage.
>
> We've also re-worked the segmented array implementation to have more
> precondition and post-condition checks when built in debug mode. This
> enables us to better implement some of the operations with surrounding
> documentation as well. The `trim` algorithm now has more documentation
> on the implementation, reducing the requirement to handle special
> conditions, and being more rigorous on the computations involved.
>
> In this change we also introduce an initialisation guard, through which
> we prevent an initialisation operation from racing with a cleanup
> operation.
>
> We also ensure that the ThreadTries array is not destroyed while copies
> into the elements are still being performed by other threads submitting
> profiles.
>
> Note that this change still has an issue with accessing thread-local
> storage from signal handlers that are instrumented with XRay. We also
> learn that with the testing of this patch, that there will be cases
> where calls to mmap(...) (through internal_mmap(...)) might be called in
> signal handlers, but are not async-signal-safe. Subsequent patches will
> address this, by re-using the `BufferQueue` type used in the FDR mode
> implementation for pre-allocated memory segments per active, tracing
> thread.
>
> We still want to land this change despite the known issues, with fixes
> forthcoming.
>
> Reviewers: mboerger, jfb
>
> Subscribers: jfb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D54989
llvm-svn: 348346
|
|
|
|
|
|
| |
Follow-up to D54989.
llvm-svn: 348338
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes the allocator and function call trie implementations
move-aware and remove the FunctionCallTrie's reliance on a
heap-allocated set of allocators.
The change makes it possible to always have storage associated with
Allocator instances, not necessarily having heap-allocated memory
obtainable from these allocator instances. We also use thread-local
uninitialised storage.
We've also re-worked the segmented array implementation to have more
precondition and post-condition checks when built in debug mode. This
enables us to better implement some of the operations with surrounding
documentation as well. The `trim` algorithm now has more documentation
on the implementation, reducing the requirement to handle special
conditions, and being more rigorous on the computations involved.
In this change we also introduce an initialisation guard, through which
we prevent an initialisation operation from racing with a cleanup
operation.
We also ensure that the ThreadTries array is not destroyed while copies
into the elements are still being performed by other threads submitting
profiles.
Note that this change still has an issue with accessing thread-local
storage from signal handlers that are instrumented with XRay. We also
learn that with the testing of this patch, that there will be cases
where calls to mmap(...) (through internal_mmap(...)) might be called in
signal handlers, but are not async-signal-safe. Subsequent patches will
address this, by re-using the `BufferQueue` type used in the FDR mode
implementation for pre-allocated memory segments per active, tracing
thread.
We still want to land this change despite the known issues, with fixes
forthcoming.
Reviewers: mboerger, jfb
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D54989
llvm-svn: 348335
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Prior to this change, we can run into situations where the TSC we're
getting when exiting a function is less than the TSC we got when
entering it. This would sometimes cause the counter for cumulative call
times overflow, which was erroneously also being stored as a signed
64-bit integer.
This change addresses both these issues while adding provisions for
tracking CPU migrations. We do this because moving from one CPU to
another doesn't guarantee that the timestamp counter for some
architectures aren't guaranteed to be synchronised. For the moment, we
leave the provisions there until we can update the data format to
include the counting of CPU migrations we can catch.
We update the necessary tests as well, ensuring that our expectations
for the cycle accounting to be met in case of counter wraparound.
Reviewers: mboerger
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54088
llvm-svn: 346116
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Some cases where `postCurrentThreadFCT()` are not guarded by our
recursion guard. We've observed that sometimes these can lead to
deadlocks when some functions (like memcpy()) gets outlined and the
version of memcpy is XRay-instrumented, which can be materialised by the
compiler in the implementation of lower-level components used by the
profiling runtime.
This change ensures that all calls to `postCurrentThreadFCT` are guarded
by our thread-recursion guard, to prevent deadlocks.
Reviewers: mboerger, eizan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53805
llvm-svn: 345489
|
|
|
|
|
|
|
|
|
|
| |
This abstracts away the file descriptor related logic which makes it
easier to port XRay to platform that don't use file descriptors or
file system for writing the log data, such as Fuchsia.
Differential Revision: https://reviews.llvm.org/D52161
llvm-svn: 344578
|
|
|
|
|
|
|
|
|
| |
This API has been deprecated three months ago and shouldn't be used
anymore, all clients should migrate to the new string based API.
Differential Revision: https://reviews.llvm.org/D51606
llvm-svn: 342318
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In this change we apply `XRAY_NEVER_INSTRUMENT` to more functions in the
profiling implementation to ensure that these never get instrumented if
the compiler used to build the library is capable of doing XRay
instrumentation.
We also consolidate all the allocators into a single header
(xray_allocator.h) which sidestep the use of the internal allocator
implementation in sanitizer_common.
This addresses more cases mentioned in llvm.org/PR38577.
Reviewers: mboerger, eizan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51776
llvm-svn: 341647
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change removes further cases where the profiling mode
implementation relied on dynamic memory allocation. We're using
thread-local aligned (uninitialized) memory instead, which we initialize
appropriately with placement new.
Addresses llvm.org/PR38577.
Reviewers: eizan, kpw
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D51278
llvm-svn: 340814
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change provides access to the file header even in the in-memory
buffer processing. This allows in-memory processing of the buffers to
also check the version, and the format, of the profile data.
Reviewers: eizan, kpw
Reviewed By: eizan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50037
llvm-svn: 338347
|
|
|
|
|
|
|
|
|
| |
This change makes it so that the profiling mode implementation will only
write files when there are buffers to write. Before this change, we'd
always open a file even if there were no profiles collected when
flushing.
llvm-svn: 337443
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a follow-on to D49217 which simplifies and optimises the
implementation of the segmented array. In this patch we co-locate the
book-keeping for segments in the `__xray::Array<T>` with the data it's
managing. We take the chance in this patch to actually rename `Chunk` to
`Segment` to better align with the high-level description of the
segmented array.
With measurements using benchmarks landed in D48879, we've identified
that calls to `pthread_getspecific` started dominating the cycles, which
led us to revert the change made in D49217 to use C++ thread_local
initialisation instead (it reduces the cost by a huge margin, since we
save one PLT-based call to pthread functions in the hot path). In
particular, this is in `__xray::getThreadLocalData()`.
We also took the opportunity to remove the least-common-multiple based
calculation and instead pack as much data into segments of the array.
This greatly simplifies the API of the container which hides as much of
the implementation details as possible. For instance, we calculate the
number of elements we need for the each segment internally in the Array
instead of making it part of the type.
With the changes here, we're able to get a measurable improvement on the
performance of profiling mode on top of what D48879 already provides.
Depends on D48879.
Reviewers: kpw, eizan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49363
llvm-svn: 337343
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change simplifies the XRay Allocator implementation to self-manage
an mmap'ed memory segment instead of using the internal allocator
implementation in sanitizer_common.
We've found through benchmarks and profiling these benchmarks in D48879
that using the internal allocator in sanitizer_common introduces a
bottleneck on allocating memory through a central spinlock. This change
allows thread-local allocators to eliminate contention on the
centralized allocator.
To get the most benefit from this approach, we also use a managed
allocator for the chunk elements used by the segmented array
implementation. This gives us the chance to amortize the cost of
allocating memory when creating these internal segmented array data
structures.
We also took the opportunity to remove the preallocation argument from
the allocator API, simplifying the usage of the allocator throughout the
profiling implementation.
In this change we also tweak some of the flag values to reduce the
amount of maximum memory we use/need for each thread, when requesting
memory through mmap.
Depends on D48956.
Reviewers: kpw, eizan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49217
llvm-svn: 337342
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change adds support for writing out profiles at program exit.
Depends on D48653.
Reviewers: kpw, eizan
Reviewed By: kpw
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48956
llvm-svn: 336969
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We found a bug while working on a benchmark for the profiling mode which
manifests as a segmentation fault in the profiling handler's
implementation. This change adds unit tests which replicate the
issues in isolation.
We've tracked this down as a bug in the implementation of the Freelist
in the `xray::Array` type. This happens when we trim the array by a
number of elements, where we've been incorrectly assigning pointers for
the links in the freelist of chunk nodes. We've taken the chance to add
more debug-only assertions to the code path and allow us to verify these
assumptions in debug builds.
In the process, we also took the opportunity to use iterators to
implement both `front()` and `back()` which exposes a bug in the
iterator decrement operation. In particular, when we decrement past a
chunk size boundary, we end up moving too far back and reaching the
`SentinelChunk` prematurely.
This change unblocks us to allow for contributing the non-crashing
version of the benchmarks in the test-suite as well.
Reviewers: kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D48653
llvm-svn: 336644
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is part of the larger XRay Profiling Mode effort.
This patch implements the profile writing mechanism, to allow profiles
collected through the profiler mode to be persisted to files.
Follow-on patches would allow us to load these profiles and start
converting/analysing them through the `llvm-xray` tool.
Depends on D44620.
Reviewers: echristo, kpw, pelikan
Reviewed By: kpw
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45998
llvm-svn: 334472
|
|
Summary:
This is part of the larger XRay Profiling Mode effort.
This patch implements the wiring required to enable us to actually
select the `xray-profiling` mode, and install the handlers to start
measuring the time and frequency of the function calls in call stacks.
The current way to get the profile information is by working with the
XRay API to `__xray_process_buffers(...)`.
In subsequent changes we'll implement profile saving to files, similar
to how the FDR and basic modes operate, as well as means for converting
this format into those that can be loaded/visualised as flame graphs. We
will also be extending the accounting tool in LLVM to support
stack-based function call accounting.
We also continue with the implementation to support building small
histograms of latencies for the `FunctionCallTrie::Node` type, to allow
us to actually approximate the distribution of latencies per function.
Depends on D45758 and D46998.
Reviewers: eizan, kpw, pelikan
Reviewed By: kpw
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D44620
llvm-svn: 334469
|