summaryrefslogtreecommitdiffstats
path: root/libcxx/www/atomic_design.html
diff options
context:
space:
mode:
Diffstat (limited to 'libcxx/www/atomic_design.html')
-rw-r--r--libcxx/www/atomic_design.html416
1 files changed, 8 insertions, 408 deletions
diff --git a/libcxx/www/atomic_design.html b/libcxx/www/atomic_design.html
index 0750b733c1f..36e73244d91 100644
--- a/libcxx/www/atomic_design.html
+++ b/libcxx/www/atomic_design.html
@@ -36,422 +36,22 @@
<!--*********************************************************************-->
<p>
-The <tt>&lt;atomic&gt;</tt> header is one of the most closely coupled headers to
-the compiler. Ideally when you invoke any function from
-<tt>&lt;atomic&gt;</tt>, it should result in highly optimized assembly being
-inserted directly into your application ... assembly that is not otherwise
-representable by higher level C or C++ expressions. The design of the libc++
-<tt>&lt;atomic&gt;</tt> header started with this goal in mind. A secondary, but
-still very important goal is that the compiler should have to do minimal work to
-faciliate the implementaiton of <tt>&lt;atomic&gt;</tt>. Without this second
-goal, then practically speaking, the libc++ <tt>&lt;atomic&gt;</tt> header would
-be doomed to be a barely supported, second class citizen on almost every
-platform.
+There are currently 3 designs under consideration. They differ in where most
+of the implmentation work is done. The functionality exposed to the customer
+should be identical (and conforming) for all three designs.
</p>
-<p>Goals:</p>
-
-<blockquote><ul>
-<li>Optimal code generation for atomic operations</li>
-<li>Minimal effort for the compiler to achieve goal 1 on any given platform</li>
-<li>Conformance to the C++0X draft standard</li>
-</ul></blockquote>
-
-<p>
-The purpose of this document is to inform compiler writers what they need to do
-to enable a high performance libc++ <tt>&lt;atomic&gt;</tt> with minimal effort.
-</p>
-
-<h2>The minimal work that must be done for a conforming <tt>&lt;atomic&gt;</tt></h2>
-
-<p>
-The only "atomic" operations that must actually be lock free in
-<tt>&lt;atomic&gt;</tt> are represented by the following compiler intrinsics:
-</p>
-
-<blockquote><pre>
-__atomic_flag__
-__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- __atomic_flag__ result = *obj;
- *obj = desr;
- return result;
-}
-
-void
-__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- *obj = desr;
-}
-</pre></blockquote>
-
-<p>
-Where:
-</p>
-
-<blockquote><ul>
+<ol type="A">
<li>
-If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then
-the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to
-<tt>int</tt>).
+<a href="atomic_design_a.html">Minimal work for the library</a>
</li>
<li>
-If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then
-the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>.
+<a href="atomic_design_b.html">Something in between</a>
</li>
<li>
-<p>
-To communicate that the above intrinsics are available, the compiler must
-arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name
-appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>.
-</p>
-<p>
-For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>:
-</p>
-<blockquote><pre>
-__has_feature(__atomic_flag) == 1
-__has_feature(__atomic_exchange_seq_cst_j) == 1
-__has_feature(__atomic_store_seq_cst_j) == 1
-
-typedef unsigned int __atomic_flag__;
-
-unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int)
-{
- // ...
-}
-
-void __atomic_store_seq_cst(unsigned int volatile*, unsigned int)
-{
- // ...
-}
-</pre></blockquote>
+<a href="atomic_design_c.html">Minimal work for the front end</a>
</li>
-</ul></blockquote>
-
-<p>
-That's it! Compiler writers do the above and you've got a fully conforming
-(though sub-par performance) <tt>&lt;atomic&gt;</tt> header!
-</p>
-
-<h2>Recommended work for a higher performance <tt>&lt;atomic&gt;</tt></h2>
-
-<p>
-It would be good if the above intrinsics worked with all integral types plus
-<tt>void*</tt>. Because this may not be possible to do in a lock-free manner for
-all integral types on all platforms, a compiler must communicate each type that
-an intrinsic works with. For example if <tt>__atomic_exchange_seq_cst</tt> works
-for all types except for <tt>long long</tt> and <tt>unsigned long long</tt>
-then:
-</p>
-
-<blockquote><pre>
-__has_feature(__atomic_exchange_seq_cst_b) == 1 // bool
-__has_feature(__atomic_exchange_seq_cst_c) == 1 // char
-__has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char
-__has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char
-__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
-__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
-__has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t
-__has_feature(__atomic_exchange_seq_cst_s) == 1 // short
-__has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short
-__has_feature(__atomic_exchange_seq_cst_i) == 1 // int
-__has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int
-__has_feature(__atomic_exchange_seq_cst_l) == 1 // long
-__has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long
-__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
-</pre></blockquote>
-
-<p>
-Note that only the <tt>__has_feature</tt> flag is decorated with the argument
-type. The name of the compiler intrinsic is not decorated, but instead works
-like a C++ overloaded function.
-</p>
-
-<p>
-Additionally there are other intrinsics besides
-<tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>. They
-are optional. But if the compiler can generate faster code than provided by the
-library, then clients will benefit from the compiler writer's expertise and
-knowledge of the targeted platform.
-</p>
-
-<p>
-Below is the complete list of <i>sequentially consistent</i> intrinsics, and
-their library implementations. Template syntax is used to indicate the desired
-overloading for integral and void* types. The template does not represent a
-requirement that the intrinsic operate on <em>any</em> type!
-</p>
-
-<blockquote><pre>
-T is one of: bool, char, signed char, unsigned char, short, unsigned short,
- int, unsigned int, long, unsigned long,
- long long, unsigned long long, char16_t, char32_t, wchar_t, void*
-
-template &lt;class T&gt;
-T
-__atomic_load_seq_cst(T const volatile* obj)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- return *obj;
-}
-
-template &lt;class T&gt;
-void
-__atomic_store_seq_cst(T volatile* obj, T desr)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- *obj = desr;
-}
-
-template &lt;class T&gt;
-T
-__atomic_exchange_seq_cst(T volatile* obj, T desr)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- T r = *obj;
- *obj = desr;
- return r;
-}
-
-template &lt;class T&gt;
-bool
-__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
- {
- std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
- return true;
- }
- std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
- return false;
-}
-
-template &lt;class T&gt;
-bool
-__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
- {
- std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
- return true;
- }
- std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
- return false;
-}
-
-T is one of: char, signed char, unsigned char, short, unsigned short,
- int, unsigned int, long, unsigned long,
- long long, unsigned long long, char16_t, char32_t, wchar_t
-
-template &lt;class T&gt;
-T
-__atomic_fetch_add_seq_cst(T volatile* obj, T operand)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- T r = *obj;
- *obj += operand;
- return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_sub_seq_cst(T volatile* obj, T operand)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- T r = *obj;
- *obj -= operand;
- return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_and_seq_cst(T volatile* obj, T operand)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- T r = *obj;
- *obj &amp;= operand;
- return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_or_seq_cst(T volatile* obj, T operand)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- T r = *obj;
- *obj |= operand;
- return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_xor_seq_cst(T volatile* obj, T operand)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- T r = *obj;
- *obj ^= operand;
- return r;
-}
-
-void*
-__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- void* r = *obj;
- (char*&amp;)(*obj) += operand;
- return r;
-}
-
-void*
-__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand)
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
- void* r = *obj;
- (char*&amp;)(*obj) -= operand;
- return r;
-}
-
-void __atomic_thread_fence_seq_cst()
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
-}
-
-void __atomic_signal_fence_seq_cst()
-{
- unique_lock&lt;mutex&gt; _(some_mutex);
-}
-</pre></blockquote>
-
-<p>
-One should consult the (currently draft)
-<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
-for the details of the definitions for these operations. For example
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail
-spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is
-not.
-</p>
-
-<p>
-If on your platform the lock-free definition of
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as
-<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a
-performance cost. The library will prefer your implementation of
-<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own
-definition for implementing
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>. That is, the library
-will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call
-<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an
-intrinsic for the strong version but not the weak.
-</p>
-
-<h2>Taking advantage of weaker memory synchronization</h2>
-
-<p>
-So far all of the intrinsics presented require a <em>sequentially
-consistent</em> memory ordering. That is, no loads or stores can move across
-the operation (just as if the library had locked that internal mutex). But
-<tt>&lt;atomic&gt;</tt> supports weaker memory ordering operations. In all,
-there are six memory orderings (listed here from strongest to weakest):
-</p>
-
-<blockquote><pre>
-memory_order_seq_cst
-memory_order_acq_rel
-memory_order_release
-memory_order_acquire
-memory_order_consume
-memory_order_relaxed
-</pre></blockquote>
-
-<p>
-(See the
-<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
-for the detailed definitions of each of these orderings).
-</p>
-
-<p>
-On some platforms, the compiler vendor can offer some or even all of the above
-intrinsics at one or more weaker levels of memory synchronization. This might
-lead for example to not issuing an <tt>mfense</tt> instruction on the x86.
-</p>
-
-<p>
-If the compiler does not offer any given operation, at any given memory ordering
-level, the library will automatically attempt to call the next highest memory
-ordering operation. This continues up to <tt>seq_cst</tt>, and if that doesn't
-exist, then the library takes over and does the job with a <tt>mutex</tt>. This
-is a compile-time search &amp; selection operation. At run time, the
-application will only see the few inlined assembly instructions for the selected
-intrinsic.
-</p>
-
-<p>
-Each intrinsic is appended with the 7-letter name of the memory ordering it
-addresses. For example a <tt>load</tt> with <tt>relaxed</tt> ordering is
-defined by:
-</p>
-
-<blockquote><pre>
-T __atomic_load_relaxed(const volatile T* obj);
-</pre></blockquote>
-
-<p>
-And announced with:
-</p>
-
-<blockquote><pre>
-__has_feature(__atomic_load_relaxed_b) == 1 // bool
-__has_feature(__atomic_load_relaxed_c) == 1 // char
-__has_feature(__atomic_load_relaxed_a) == 1 // signed char
-...
-</pre></blockquote>
-
-<p>
-The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized
-on two memory orderings. The first ordering applies when the operation returns
-<tt>true</tt> and the second ordering applies when the operation returns
-<tt>false</tt>.
-</p>
-
-<p>
-Not every memory ordering is appropriate for every operation. <tt>exchange</tt>
-and the <tt>fetch_<i>op</i></tt> operations support all 6. But <tt>load</tt>
-only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>.
-<tt>store</tt>
-only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>. The
-<tt>compare_exchange</tt> operations support the following 16 combinations out
-of the possible 36:
-</p>
-
-<blockquote><pre>
-relaxed_relaxed
-consume_relaxed
-consume_consume
-acquire_relaxed
-acquire_consume
-acquire_acquire
-release_relaxed
-release_consume
-release_acquire
-acq_rel_relaxed
-acq_rel_consume
-acq_rel_acquire
-seq_cst_relaxed
-seq_cst_consume
-seq_cst_acquire
-seq_cst_seq_cst
-</pre></blockquote>
-
-<p>
-Again, the compiler supplies intrinsics only for the strongest orderings where
-it can make a difference. The library takes care of calling the weakest
-supplied intrinsic that is as strong or stronger than the customer asked for.
-</p>
+</ol>
</div>
</body>
OpenPOWER on IntegriCloud