diff options
Diffstat (limited to 'libcxx/www/atomic_design.html')
-rw-r--r-- | libcxx/www/atomic_design.html | 416 |
1 files changed, 8 insertions, 408 deletions
diff --git a/libcxx/www/atomic_design.html b/libcxx/www/atomic_design.html index 0750b733c1f..36e73244d91 100644 --- a/libcxx/www/atomic_design.html +++ b/libcxx/www/atomic_design.html @@ -36,422 +36,22 @@ <!--*********************************************************************--> <p> -The <tt><atomic></tt> header is one of the most closely coupled headers to -the compiler. Ideally when you invoke any function from -<tt><atomic></tt>, it should result in highly optimized assembly being -inserted directly into your application ... assembly that is not otherwise -representable by higher level C or C++ expressions. The design of the libc++ -<tt><atomic></tt> header started with this goal in mind. A secondary, but -still very important goal is that the compiler should have to do minimal work to -faciliate the implementaiton of <tt><atomic></tt>. Without this second -goal, then practically speaking, the libc++ <tt><atomic></tt> header would -be doomed to be a barely supported, second class citizen on almost every -platform. +There are currently 3 designs under consideration. They differ in where most +of the implmentation work is done. The functionality exposed to the customer +should be identical (and conforming) for all three designs. </p> -<p>Goals:</p> - -<blockquote><ul> -<li>Optimal code generation for atomic operations</li> -<li>Minimal effort for the compiler to achieve goal 1 on any given platform</li> -<li>Conformance to the C++0X draft standard</li> -</ul></blockquote> - -<p> -The purpose of this document is to inform compiler writers what they need to do -to enable a high performance libc++ <tt><atomic></tt> with minimal effort. -</p> - -<h2>The minimal work that must be done for a conforming <tt><atomic></tt></h2> - -<p> -The only "atomic" operations that must actually be lock free in -<tt><atomic></tt> are represented by the following compiler intrinsics: -</p> - -<blockquote><pre> -__atomic_flag__ -__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) -{ - unique_lock<mutex> _(some_mutex); - __atomic_flag__ result = *obj; - *obj = desr; - return result; -} - -void -__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) -{ - unique_lock<mutex> _(some_mutex); - *obj = desr; -} -</pre></blockquote> - -<p> -Where: -</p> - -<blockquote><ul> +<ol type="A"> <li> -If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then -the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to -<tt>int</tt>). +<a href="atomic_design_a.html">Minimal work for the library</a> </li> <li> -If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then -the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>. +<a href="atomic_design_b.html">Something in between</a> </li> <li> -<p> -To communicate that the above intrinsics are available, the compiler must -arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name -appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>. -</p> -<p> -For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>: -</p> -<blockquote><pre> -__has_feature(__atomic_flag) == 1 -__has_feature(__atomic_exchange_seq_cst_j) == 1 -__has_feature(__atomic_store_seq_cst_j) == 1 - -typedef unsigned int __atomic_flag__; - -unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int) -{ - // ... -} - -void __atomic_store_seq_cst(unsigned int volatile*, unsigned int) -{ - // ... -} -</pre></blockquote> +<a href="atomic_design_c.html">Minimal work for the front end</a> </li> -</ul></blockquote> - -<p> -That's it! Compiler writers do the above and you've got a fully conforming -(though sub-par performance) <tt><atomic></tt> header! -</p> - -<h2>Recommended work for a higher performance <tt><atomic></tt></h2> - -<p> -It would be good if the above intrinsics worked with all integral types plus -<tt>void*</tt>. Because this may not be possible to do in a lock-free manner for -all integral types on all platforms, a compiler must communicate each type that -an intrinsic works with. For example if <tt>__atomic_exchange_seq_cst</tt> works -for all types except for <tt>long long</tt> and <tt>unsigned long long</tt> -then: -</p> - -<blockquote><pre> -__has_feature(__atomic_exchange_seq_cst_b) == 1 // bool -__has_feature(__atomic_exchange_seq_cst_c) == 1 // char -__has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char -__has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char -__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t -__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t -__has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t -__has_feature(__atomic_exchange_seq_cst_s) == 1 // short -__has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short -__has_feature(__atomic_exchange_seq_cst_i) == 1 // int -__has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int -__has_feature(__atomic_exchange_seq_cst_l) == 1 // long -__has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long -__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void* -</pre></blockquote> - -<p> -Note that only the <tt>__has_feature</tt> flag is decorated with the argument -type. The name of the compiler intrinsic is not decorated, but instead works -like a C++ overloaded function. -</p> - -<p> -Additionally there are other intrinsics besides -<tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>. They -are optional. But if the compiler can generate faster code than provided by the -library, then clients will benefit from the compiler writer's expertise and -knowledge of the targeted platform. -</p> - -<p> -Below is the complete list of <i>sequentially consistent</i> intrinsics, and -their library implementations. Template syntax is used to indicate the desired -overloading for integral and void* types. The template does not represent a -requirement that the intrinsic operate on <em>any</em> type! -</p> - -<blockquote><pre> -T is one of: bool, char, signed char, unsigned char, short, unsigned short, - int, unsigned int, long, unsigned long, - long long, unsigned long long, char16_t, char32_t, wchar_t, void* - -template <class T> -T -__atomic_load_seq_cst(T const volatile* obj) -{ - unique_lock<mutex> _(some_mutex); - return *obj; -} - -template <class T> -void -__atomic_store_seq_cst(T volatile* obj, T desr) -{ - unique_lock<mutex> _(some_mutex); - *obj = desr; -} - -template <class T> -T -__atomic_exchange_seq_cst(T volatile* obj, T desr) -{ - unique_lock<mutex> _(some_mutex); - T r = *obj; - *obj = desr; - return r; -} - -template <class T> -bool -__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) -{ - unique_lock<mutex> _(some_mutex); - if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) - { - std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); - return true; - } - std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); - return false; -} - -template <class T> -bool -__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) -{ - unique_lock<mutex> _(some_mutex); - if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) - { - std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); - return true; - } - std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); - return false; -} - -T is one of: char, signed char, unsigned char, short, unsigned short, - int, unsigned int, long, unsigned long, - long long, unsigned long long, char16_t, char32_t, wchar_t - -template <class T> -T -__atomic_fetch_add_seq_cst(T volatile* obj, T operand) -{ - unique_lock<mutex> _(some_mutex); - T r = *obj; - *obj += operand; - return r; -} - -template <class T> -T -__atomic_fetch_sub_seq_cst(T volatile* obj, T operand) -{ - unique_lock<mutex> _(some_mutex); - T r = *obj; - *obj -= operand; - return r; -} - -template <class T> -T -__atomic_fetch_and_seq_cst(T volatile* obj, T operand) -{ - unique_lock<mutex> _(some_mutex); - T r = *obj; - *obj &= operand; - return r; -} - -template <class T> -T -__atomic_fetch_or_seq_cst(T volatile* obj, T operand) -{ - unique_lock<mutex> _(some_mutex); - T r = *obj; - *obj |= operand; - return r; -} - -template <class T> -T -__atomic_fetch_xor_seq_cst(T volatile* obj, T operand) -{ - unique_lock<mutex> _(some_mutex); - T r = *obj; - *obj ^= operand; - return r; -} - -void* -__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand) -{ - unique_lock<mutex> _(some_mutex); - void* r = *obj; - (char*&)(*obj) += operand; - return r; -} - -void* -__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand) -{ - unique_lock<mutex> _(some_mutex); - void* r = *obj; - (char*&)(*obj) -= operand; - return r; -} - -void __atomic_thread_fence_seq_cst() -{ - unique_lock<mutex> _(some_mutex); -} - -void __atomic_signal_fence_seq_cst() -{ - unique_lock<mutex> _(some_mutex); -} -</pre></blockquote> - -<p> -One should consult the (currently draft) -<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a> -for the details of the definitions for these operations. For example -<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail -spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is -not. -</p> - -<p> -If on your platform the lock-free definition of -<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as -<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the -<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a -performance cost. The library will prefer your implementation of -<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own -definition for implementing -<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>. That is, the library -will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call -<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an -intrinsic for the strong version but not the weak. -</p> - -<h2>Taking advantage of weaker memory synchronization</h2> - -<p> -So far all of the intrinsics presented require a <em>sequentially -consistent</em> memory ordering. That is, no loads or stores can move across -the operation (just as if the library had locked that internal mutex). But -<tt><atomic></tt> supports weaker memory ordering operations. In all, -there are six memory orderings (listed here from strongest to weakest): -</p> - -<blockquote><pre> -memory_order_seq_cst -memory_order_acq_rel -memory_order_release -memory_order_acquire -memory_order_consume -memory_order_relaxed -</pre></blockquote> - -<p> -(See the -<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a> -for the detailed definitions of each of these orderings). -</p> - -<p> -On some platforms, the compiler vendor can offer some or even all of the above -intrinsics at one or more weaker levels of memory synchronization. This might -lead for example to not issuing an <tt>mfense</tt> instruction on the x86. -</p> - -<p> -If the compiler does not offer any given operation, at any given memory ordering -level, the library will automatically attempt to call the next highest memory -ordering operation. This continues up to <tt>seq_cst</tt>, and if that doesn't -exist, then the library takes over and does the job with a <tt>mutex</tt>. This -is a compile-time search & selection operation. At run time, the -application will only see the few inlined assembly instructions for the selected -intrinsic. -</p> - -<p> -Each intrinsic is appended with the 7-letter name of the memory ordering it -addresses. For example a <tt>load</tt> with <tt>relaxed</tt> ordering is -defined by: -</p> - -<blockquote><pre> -T __atomic_load_relaxed(const volatile T* obj); -</pre></blockquote> - -<p> -And announced with: -</p> - -<blockquote><pre> -__has_feature(__atomic_load_relaxed_b) == 1 // bool -__has_feature(__atomic_load_relaxed_c) == 1 // char -__has_feature(__atomic_load_relaxed_a) == 1 // signed char -... -</pre></blockquote> - -<p> -The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized -on two memory orderings. The first ordering applies when the operation returns -<tt>true</tt> and the second ordering applies when the operation returns -<tt>false</tt>. -</p> - -<p> -Not every memory ordering is appropriate for every operation. <tt>exchange</tt> -and the <tt>fetch_<i>op</i></tt> operations support all 6. But <tt>load</tt> -only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>. -<tt>store</tt> -only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>. The -<tt>compare_exchange</tt> operations support the following 16 combinations out -of the possible 36: -</p> - -<blockquote><pre> -relaxed_relaxed -consume_relaxed -consume_consume -acquire_relaxed -acquire_consume -acquire_acquire -release_relaxed -release_consume -release_acquire -acq_rel_relaxed -acq_rel_consume -acq_rel_acquire -seq_cst_relaxed -seq_cst_consume -seq_cst_acquire -seq_cst_seq_cst -</pre></blockquote> - -<p> -Again, the compiler supplies intrinsics only for the strongest orderings where -it can make a difference. The library takes care of calling the weakest -supplied intrinsic that is as strong or stronger than the customer asked for. -</p> +</ol> </div> </body> |