Comparing Rust's and C++'s Concurrency Library
Contents
The concurrency features that are included in the Rust standard library are quite similar to what was available in C++11: threads, atomics, mutexes, condition variables, and so on. In the past few years, however, C++ has gained quite a few new concurrency related features as part C++17 and C++20, with more proposals still coming in for future versions.
Let’s take some time to review C++ concurrency features, discuss what their Rust equivalent could look like, and what it’d take to get there.
atomic_ref
P0019R8 introduced
std::atomic_ref
to C++.
It’s a type that allows you to use a non-atomic object as an atomic one.
For example, you can create a atomic_ref<int>
that references a regular int
,
allowing you the same functionality as if it were an atomic<int>
.
While in C++ this needed a whole new type that duplicates most of the atomic
interface,
the equivalent Rust feature is a one-line function: Atomic*::from_mut
.
This function allows you to convert, for example, a &mut u32
to a &AtomicU32
,
which is a form of aliasing that’s perfectly sound in Rust.
The C++ atomic_ref
type comes with safety requirements that you need to uphold manually.
As long as you’re using an atomic_ref
to access an object, all access to that object must
be through an atomic_ref
.
Accessing it directly when there’s still an atomic_ref
results in undefined behavior.
In Rust, however, this is already fully taken care of by the borrow checker.
The compiler understands that by borrowing the u32
mutably,
nothing is allowed to access that u32
directly until that borrow ends.
The lifetime of the &mut u32
that goes into the from_mut
function
is preserved as part of the &AtomicU32
you get out of it.
You can make as many copies of that &AtomicU32
as you want,
but the original borrow only ends once all copies of that reference are gone.
The from_mut
function is currently unstable, but perhaps it’s time we stabilize it.
Generic atomic type
In C++, the std::atomic
is generic: you can have a atomic<int>
, but also an atomic<MyOwnStruct>
.
In Rust, on the other hand, we only have specific atomic types: AtomicU32
, AtomicBool
, AtomicUsize
, etc.
C++’s atomic type supports objects of any size, regardless of what the platform supports.
It automatically falls back to a lock-based implementation for objects of a size that are not supported by the platform’s native atomic operations.
On the other hand, Rust only provides the types that are natively supported by the platform.
If you’re compiling for a platform that does not have 64 bit atomics, AtomicU64
does not exist.
This has advantages and disadvantages.
It means Rust code using AtomicU64
might fail to compile for certain platforms,
but it also means no performance related surprises when some types silently fall back to a very different implementation.
It also means we can assume a AtomicU64
is represented exactly the same as an u64
in memory, allowing for functions like AtomicU64::from_mut
.
Having a generic Atomic<T>
in Rust that works for types of any size can be tricky.
Without specialization, we can’t make Atomic<LargeThing>
include a Mutex
, while not including it in Atomic<SmallThing>
.
What we could do, however, is to store the mutexes in a global HashMap
, indexed by memory address.
Then the Atomic<T>
can be identical in size to a T
, and use a Mutex
from this global hash map when necessary.
This is exactly what the popular atomic
crate does.
A proposal for adding such a universal Atomic<T>
type to the Rust standard library would need to discuss
whether it should be usable in no_std
programs.
A regular HashMap
requires allocation, which isn’t possible in no_std
programs.
A fixed size table could work for no_std
programs, but might be undesirable for various reasons.
Compare-exchange with padding
P0528R3 changes how compare_exchange
deals with padding.
A compare exchange operation on a atomic<TypeWithPadding>
used to compare the padding bits as well,
but that turned out to be a bad idea.
Nowadays, padding bits are no longer included in the comparison.
Since Rust currently only provides atomic types for integers, without any padding, this change is irrelevant for Rust.
However, a proposal for a Atomic<T>
with a compare_exchange
method would need to discuss how padding is handled,
and should probably take input from this proposal.
Compare-exchange memory ordering
In C++11,
the compare_exchange
functions required the success memory ordering to be at least as strong as the failure ordering.
A compare_exchange(…, …, memory_order_release, memory_order_acquire)
was not accepted.
This requirement was copied verbatim to Rust’s
compare_exchange
functions.
P0418R2 argued that this restriction should be lifted, which happened as part of C++17.
The same restriction is lifted as part of Rust 1.64, as part of rust-lang/rust#98383.
constexpr Mutex constructor
C++’s std::mutex
has a constexpr
constructor,
which means it can be constructed as part of constant evaulation at compile time.
However, not all implementations actually provide this.
For example, Microsoft’s implementation of std::mutex
doesn’t include a constexpr
constructor.
So, relying on this is a bad idea for portable code.
Also, interestingly, C++’s std::condition_variable
and std::shared_mutex
don’t
provide a constexpr
constructor at all.
Rust’s original Mutex
in Rust 1.0 did not include a const fn new
.
Combined with how Rust’s strict requirements for static initialization,
this made the Mutex
quite annoying to use in a static
variable.
This has been resolved in Rust 1.63.0
as part of rust-lang/rust#93740:
all of Mutex::new
, RwLock::new
and Condvar::new
are now const
functions.
Latches and barriers
P1135R6 introduced, among other things, std::latch
and std::barrier
to C++20.
Both are types that allow waiting for several threads to reach a certain point.
A latch is basically just a counter that gets decremented by each thread and allows you to wait for it to reach zero.
It can only be used once.
A barrier is a more advanced version of this idea that can be reused, and accepts a “completion function”
to be automatically executed when the counter reaches zero.
Rust has had a similar Barrier
type since 1.0.
It was inspired by pthread (pthread_barrier_t
) rather than C++.
Rust’s (and pthread’s) barrier is less flexible than what’s now included in C++.
It only has a “decrement and wait” operation (called wait
),
and lacks the “only wait”, “only decrement”, and “decrement and drop” functions that C++’s
std::barrier
comes with.
On the other hand, unlike C++, Rust’s (and pthread’s) “decrement and wait” operation assigns one thread to be the group leader. This is a (perhaps more flexible) alternative to a completion function.
The missing operations on the Rust version could easily be added at any point. All we need is a good proposal for the names of these new methods. :)
Semaphore
That same P1135R6 also added semaphores to C++20:
std::counting_semaphore
and std::binary_semaphore
.
Rust does not have a general semaphore type,
although it does equip every single thread with what’s effectively a binary semaphore,
through thread::park
and unpark
.
A semaphore can be easily constructed manually using a Mutex<u32>
and a Condvar
,
but most operating systems allow for a more efficient and smaller implementation using a single AtomicU32
.
For example, through futex()
on Linux
and WaitOnAddress()
on Windows.
It depends on the operating system and its version which sizes of atomics can be used for these operations.
C++’s counting_semaphore
is a template that takes an integer as argument to indicate how far we want to be able to count.
For example, a counting_semaphore<1000>
can count up to at least 1000, and will therefore be 16 bit or larger.
The binary_semaphore
type is just an alias for counting_semaphore<1>
, and can be a single byte on some platforms.
In Rust, we’re probably not quite ready for this kind of generic type any time soon. Rust’s generics force a certain kind of consistency that puts some limitations on what we can do with constants as generic arguments.
We could have separate Semaphore32
, Semaphore64
, and so on, but that seems a bit overkill.
Having Semaphore<u32>
, Semaphore<u64>
and perhaps even Semaphore<bool>
could be possible,
but is something we haven’t done before in the standard library.
Our atomic types are simply AtomicU32
, AtomicU64
, and so on.
As mentioned above, for our atomic types, we only provide the ones that are natively supported by the platform you’re compiling for.
If we were to apply the same philosophy to Semaphore
, it wouldn’t exist on platforms that don’t have a futex
or WaitOnAddress
function, such as macOS.
And if we had separate semaphore types for different sizes, some sizes wouldn’t exist on (some versions of) Linux and various BSDs.
If we want a standard semaphore type in Rust, we’d first need some input on
whether we actually need semaphores of different sizes, and what form of flexibility and portability would be necessary to make them useful.
Perhaps we should go with just a single 32-bit Semaphore
type that’s always available (using a lock-based fallback),
but any such proposal would have to include a detailed explanation of use cases and limitations.
Atomic wait and notify
The remaining new features that P1135R6 adds to C++20 are
the atomic wait
and notify
functions.
These functions effectively directly expose
Linux’s futex()
and Windows’s
WaitOnAddress()
through a standard interface.
However, they are available on atomics of all sizes, on all platforms, regardless of what the operating system supports.
Linux futexes are always 32 bit, but C++ allows for atomic<uint64_t>::wait
just fine.
A way of doing this, is using something resembling a “parking lot”:
effectively a global HashMap
that maps memory addresses to locks and queues.
That means that a 32 bit wait operation on Linux could use the very fast futex based implementation,
while the other sizes would use a very different implementation.
If we were to follow the philosophy of only providing the types and functions that are natively supported
(like we do for the atomic types), we wouldn’t provide such a fallback implementation.
That’d mean we only have AtomicU32::wait
(and AtomicI32::wait
) on Linux,
while all atomic types would include this wait
method on Windows.
A proposal for Atomic*::wait
and Atomic*::notify
in Rust would need to include
a discussion on whether a fall back to a global table is desirable in Rust or not.
jthread and stop_token
P0660R10 adds std::jthread
and std::stop_token
to C++20.
If we ignore the stop_token
for a second, jthread
is basically just a regular std::thread
that automatically gets join()
‘ed on destruction.
This avoids accidentally detaching a thread and letting it run for longer than expected, which might happen with a regular thread
.
However, it also introduces a potential new pitfall: immediately destructing a jthread
object will immediately join the thread, effectively removing any potential parallelism.
As of Rust 1.63.0, we have scoped threads
(rust-lang/rust#93203).
Just like a jthread
, a scoped thread is automatically joined.
However, point before which they are joined is made explicit, and is a guarantee that can be relied upon for safety.
The borrow checker even understands this guarantee, allowing you to safely borrow local variables in the scoped thread(s), as long as those variables
outlive the scope.
In addition to automatically joining, a main feature of jthreads
is their stop_token
and corresponding stop_source
.
One can call request_stop()
on a stop_source
to make the corresponding stop_requested()
method on stop_token
return true.
This can be used to nicely ask the thread to please stop, and is automatically done in the destructor of jthread
before joining.
It’s up to the code of the thread to actually check the token and stop if it was set.
So far, it almost looks like a plain AtomicBool
.
Where things get very different is the stop_callback
type.
This type allows registering a callback, a “stop function”, to be registered with a stop token.
Requesting a stop using the corresponding stop source will execute this function.
Effectively, a thread can use this to let others know how to stop or cancel its work.
In Rust, we could easily add the AtomicBool
-like functionality to the Scope
object of thread::scope
.
A simple is_finished(&self) -> bool
or stop_requested(&self) -> bool
that indicates whether the main scope
function
is finished might suffice. Maybe combined with a request_stop(&self)
method to request it from anywhere.
The stop_callback
feature is more complicated, and any Rust equivalent would probably need a detailed proposal discussing its interface, use cases and limitations.
Atomic floats
P0020R6 adds support for atomic floating point addition and subtraction to C++20.
It’d be easy to add a AtomicF32
or AtomicF64
to Rust as well,
but it seems that the only platforms that natively support atomic floating point operations
are some GPUs that are not supported by Rust (yet?).
A proposal to add these types to Rust would have to present some compelling use cases.
Atomic per byte memcpy
Currently, it’s not possible to efficiently implement sequence locks in Rust or C++ that abides by all the rules of the memory model.
P1478R7 proposes to add atomic_load_per_byte_memcpy
and atomic_store_per_byte_memcpy
to a future version of C++ to solve this issue.
For Rust, I wrote a proposal to expose the functionality through a AtomicPerByte<T>
type: RFC 3301.
Atomic shared_ptr
P0718R2 added specializations for atomic<shared_ptr>
and atomic<weak_ptr>
to C++20.
Reference counted pointers (shared_ptr
in C++, Arc
in Rust) are quite commonly used for concurrent lock-free data structures.
The atomic<shared_ptr>
specialization makes it easier to do this correctly, by handling the reference count properly.
In Rust, we could add equivalent AtomicArc<T>
and AtomicWeak<T>
types.
(Although AtomicArc
sounds a bit weird maybe, considering the A
of Arc
already stands for “atomic”. :) )
However, C++’s shared_ptr<T>
is nullable, while in Rust that requires a Option<Arc<T>>
.
It’s not immediately clear whether AtomicArc<T>
should be nullable, or whether should we also have a AtomicOptionArc<T>
.
The popular arc-swap
crate already provides all these variants in Rust,
but, as far as I know, there hasn’t been any proposal yet to add anything similar to the standard library.
synchronized_value
P0290R2 was not accepted, but proposed a type called synchronized_value<T>
which combines a mutex
with a T
.
Even though it wasn’t accepted at that time into C++, it’s an interesting proposal, because synchronized_value<T>
is pretty much exactly what a Mutex<T>
is in Rust.
In C++, a std::mutex
does not contain the data it protects, nor does it even know what it is protecting at all.
This means that it is the responsibility of the user to remember which data is protected and by which mutex,
and ensure the right mutex is locked every time “protected” data is accessed.
Rust’s Mutex<T>
design with a MutexGuard
behaving like a (mutable) reference to T
allows for much more safety, while still allowing for a Mutex<()>
in cases where
you need only a mutex, without any data directly attached to it.
The proposal for synchronized_value<T>
was an attempt at adding this pattern to C++, but used closures instead of a mutex guard, since C++ doesn’t track lifetimes.
Conclusion
It seems to me that C++ can continue to be a source of inspiration for Rust, although we should take care not to copy-paste ideas directly.
As we’ve seen with Mutex<T>
, scoped threads, Atomic*::from_mut
and others, things can often take a very different (often more ergonomic) shape in Rust
while providing the same functionality.
Providing the exact same functionality as C++ shouldn’t be a primary goal. The goal should be to provide exactly what the Rust ecosystem needs from the language and standard library, which might be different than what C++ users need from their language.
If you have concurrency needs from the Rust standard library that we currently don’t fulfill, I’d love to hear from you, regardless of whether it’s something that’s already solved in another language or not.