Do we need a "Rust Standard"?
Contents
Languages like C and C++ are standardized. They are fully specified in an internationally recognized standards document. Languages like Python, Swift and Rust do not have such a standards document.
Should Rust be standardized? Why, or why not? In this blog post, I try to explain why I do think we need an accurate specification, why I do not think we need “standardization” (depending on your definition), and give an overview of the current state of Rust’s stability and specification efforts.
C and C++
Before we talk about Rust, let’s first take a look at C and C++; why and how they are standardized.
There exist many independent compilers for C (and C++). The most well-known ones are Clang (the C language frontend for LLVM), GCC (the GNU compiler collection), and MSVC (Microsoft’s C++ compiler), but there are (and have been) many more. Many vendors of proprietary platforms ship, or used to ship, their own C compiler.
Each of these compilers supports the same language, C, but what is “C”? “K&R C”, “Turbo C”, “GNU C”, “Clang C”, “Microsoft C”, “Intel C”, “ACME C”, and so on are all supposed to be the exact same language, but each one slightly differs from the others, often in subtle ways. Some of these differences are bugs, some are deliberate vendor-specific extensions, and some result from a difference in opinion on what “C” is or isn’t.
To be able to write portable C code that works as expected using any C compiler, a standardization effort was started at the American National Standard Institute (ANSI). In 1989, this resulted in an official definition of C: “ANSI C”, now often referred to as C89. A bit later, this effort was taken over by the International Organization for Standardization (ISO), which published the document again as “ISO C” (C90) and has since published several new versions of the standard. (C95, C99, C11, and C17.)
This means there is now an internationally recognized definition of C. A company contracting out some implementation work can now clearly specify ISO C in the requirements in their legal documentation, to make sure they won’t end up with code that only works in MSVC, for example. Regulations on safety critical software in C can now refer to ISO C, to make sure there can be no confusion about the meaning of any of the rules.
The C standard
is more than five hundred pages, and specifies every single aspect of the language (including its standard library).
There are quite a few things it leaves unspecified or undefined,
which leaves space for differences between compilers and platforms,
and for things a correct C program should never do.
For example, it does not specify how many bits an int
is.
That depends on the compiler (and target platform), which should specify it in their documentation instead.
It also doesn’t specify what happens if you divide by zero.
No conforming C program should ever do that.
Similarly, the C++ Standard is an almost two thousand page document that specifies the entirety of the C++ language, clearly specifying what a conforming C++compiler and a conforming C++ program must adhere to.
Evolution of C and C++
The C and C++ standards are developed by each their own ISO committee: WG14 for C, and WG21 for C++. ISO’s rules determine who can join and participate. Many countries and companies have representatives on the committees.
While the goal of a standarization committee supposedly is to “document existing practice”, it is nowadays often a place of innovation as well. Many new C and C++ features are proposed (in a proposal document) to the committees before compilers implement them. Once accepted, it’ll eventually result in changes and additions to the draft of the language standard, which is released every few years.
This means that every few years, there’s a new version of the C++ language with many new features, and sometimes some subtle breaking changes. The compiler vendors implement the new features and changes and release new versions of their compilers on their own schedule, and will inform their users what parts of the new standard they support yet.
A reason for companies to invest in participation in a standardization committee, which can be quite expensive, is to be able to influence the direction of the language’s evolution. For example, a hypothetical company whose computers after 60 years still lack modern features such as square brackets would be able to vote against the deprecation and removal of trigraphs, to make sure their source files would still compile on all ISO C++ compliant compilers.
Rust
Rust’s history is very different. The Rust Project started much later, in an era where online open-source collaboration and cross-platform software are much more commonplace than forty years ago.
Most new programming languages, like Rust, have a single official compiler (or interpreter) that’s collaboratively developed by many contributors. The same organisation that governs the language, also governs the compiler and related tools as part of the same project. While there might be other compilers that attempt to be compatible, those are often developed for a specific use case and don’t hold the same status as the official one.
This means there is no confusion about whether “Rust” could mean “Microsoft Rust” or “Amazon Rust” or whatever. There is only one Rust: the language and compiler developed by the Rust Project. In fact, the project (or technically the Rust Foundation) owns the Rust trademark.
We, the Rust Project, get to define what “Rust™” means.
So, how exactly do we define “Rust”?
The easy cop-out answer is “whatever our compiler does”. Or perhaps, “whatever the teams agreed upon in their many meetings and discussions”. But that isn’t very helpful. While simple, those definitions aren’t very accurate or precise.
Evolution of Rust
To better understand how to give a more useful definition, we first need to understand how Rust evolves. What “Rust” means changes regularly, as we are constantly working on improvements. Every six weeks, we release a new stable version of the compiler, each time changing what “Rust” means.
A proposal for a large change or addition to Rust is done through the Request For Comments (RFC) process. The first step in this process is writing a proposal document and submitting it as a pull request on the Rust RFCs repository. This makes the document public, and opens it up to feedback from anyone. The feedback often leads to changes to the document. Once a team member of the authoritative team (e.g. the language team or the library API team) considers the proposal to have converged to an acceptable state, they propose a Final Commenting Period (FCP). Once all but at most two of the team members of the authoritative team have confirmed, the 10-day FCP starts and will be announced publicly in the next This Week in Rust update. If, during that period, anyone brings any new feedback, any team member can file a blocking concern that will stop and reset the FCP. Depending on the concern, it might be resolved to restart a new 10-day FCP, or it might result in the proposal going back to the drawing board.
Once the FCP finishes and the RFC is merged into the RFCs repository, the document is available in the RFC book and a tracking issue is opened on GitHub to track the development of the new feature or change.
At this point, the new feature is likely, but not guaranteed to land in a future stable version of Rust.
The feature will be implemented by volunteers (often including the author(s) of the RFC)
and will be made available as an unstable feature.
Unstable features can only be used on nightly versions of Rust, not on beta or stable versions.
Even on nightly Rust, the compiler will only allow you to use an unstable feature when you
explicitly opt-in to using it, using #![feature(…)]
in your crate.
During the time a feature is still unstable and in development, it often happens that new insights from actual usage of the experimental feature result some changes to the feature that diverge from the original RFC. For this reason, it’s acceptable if an RFC is merged with a few remaining unresolved questions, leaving those for later while the feature is being implemented and tested.
Once the feature is fully implemented, all the unresolved questions are resolved, and there is no more ongoing discussion about the future, a team member of the authoritative team can propose a stabilization FCP. Once the rest of the team confirms, the FCP starts and the expected stabilization is announced in This Week in Rust. Once the FCP finishes after 10 days without blocking concerns, a pull request can be merged that marks the feature as stable, usually together or right after a PR that updates the Rust Reference.
This means that from that moment, the compiler will allow you to use the new feature
without the #![feature(…)]
tag.
Stability
Rust uses semantic versioning (SemVer) for its versions.
While popular crates on crates.io sometimes publish a new major version,
the Rust compiler (including standard libraries and Cargo, etc.) is likely to remain on a compatible 1.0
major
release for the foreseeable future, only increasing its minor version every six weeks.
This means that everything that is released as part of stable Rust, is there to stay. Once a feature is stable, we cannot remove it. This is why we’re so careful with adding a new feature; why we require team consensus before stabilizing something.
Since we release a new version of Rust so often (every six weeks), we are very committed to backwards compatibility. We don’t support older versions of Rust, but we do try very hard never break anything in a new version of Rust. If your code no longer works in a newer version of Rust (and you didn’t use unstable nightly features), we probably did something wrong. (There are a few unfortunate situations where code can break. For example, adding a new item can change how a name is resolved in some cases. But we’re working on ways to minimize those situations.)
We are so committed to stability, that we compile and test every single crate on crates.io and every single Rust repository on GitHub with a Cargo.lock file using the new version of Rust before releasing it. This is also often done for individual changes before merging them, to catch any potential problems before we even accept a change.
That’s right, we’ve been compiling your open-source Rust code and running your tests, sometimes weekly or more often, to make sure we don’t accidentally break your code. This is done using a tool called Crater, which distributes the work over a few dedicated servers. It usually takes a few days, at the end of which it produces a list of every single crate that used to compile and/or pass its tests, but no longer does with the newer compiler. The results are processed by hand, and sometimes results in changes to the Rust compiler being reverted.
In some cases, it turns out that the failed crate contained code that was already unsound, but compiled by accident. In those cases, we often inform the crate maintainer and sometimes even help them fix it.
Plans are being drafted for expanding Crater to include private code bases as well. While this hasn’t been implemented yet, we could make it possible for companies to subscribe to Crater runs such that they can run the same tests internally on their own private code base, and contribute their results back after removing any private details.
Rust Reference Documentation
While we do have a reference that explains what features are stable and many of the guarantees of the language, it is not a complete specification.
For the standard library, we have the reference documentation that contains every single public item, including the Rust version since when it is available. (It also documents unstable items, but clearly marks them as experimental.)
However, it is often not necessary to look up the documentation to know if a feature is stable,
because the compiler knows and requires you to opt-in to unstable features (using #![feature(…)]
on nightly Rust).
If you can use something on stable Rust without getting any compilation errors, it was stable.
This is different than what you might experience when programming in a language like C,
where you might need to pay extra attention not to accidentally rely on any compiler-specific extensions
that are not part of the language specification.
Things get more subtle when talking about the stability of behavior rather than the existence of a feature.
For example, it’s easy to determine that [T]::binary_search
is guaranteed to exist in future versions of Rust:
calling it will compile just fine on stable Rust.
However, this doesn’t answer questions like “will it always return the index
of the first matching element, if there’s multiple matching elements?”
To answer that question, we’ll have to check its documentation.
(The answer is no, that’s not guaranteed.
However, if we ever change its behaviour and during a Crater run we find out that many crates
relied on the specific behavior, we will be extra careful; perhaps even reverting the change,
even though the crates shouldn’t have relied on it.)
In other words, the library documentation already serves the role of a (somewhat informal) specification.
Rust Editions
Sometimes, we do want to change something in a backwards-incompatible way.
For example, a few years ago we wanted to add the async
and await
keywords,
but that would’ve broken existing code that uses those words as variable names. (E.g. let async = true;
)
C++20 had to deal with a similar problem, and there the solution was to use less commonly used words (like co_await
, etc.)
to minimize breakage.
For Rust, such a solution didn’t fit with our commitment to
“stability without stagnation”,
and a different solution was found: Rust Editions.
A “Rust Edition” is basically a variant (or “dialect”) of the language.
Unlike versions, all editions are supported and the latest version of the compiler supports all editions:
currently Rust 2015, Rust 2018, and Rust 2021.
In Rust 2015, async
is a regular identifier, but in Rust 2018 and Rust 2021, async
is a keyword.
This might sound similar to version of the C++ language (like C++11 and C++17),
but Rust Editions can be mixed and are selected per crate (i.e. “translation unit” in C++ terms).
Code written in Rust 2018 can use dependencies written in Rust 2015 and Rust 2021 just fine.
(It even works across macros! If you define a macro in Rust 2015 code that expands to let async = 1;
and expand it in Rust 2018 code, it still works! The compiler tracks that that async
word there originated
in Rust 2015 code.)
Editions are documented in the edition guide. The differences between editions are kept as minimal as possible. Editions are only used when absolutely necessary.
The vast majority of additions to Rust do not require an edition, and their availability is not tied to an edition. For example, scoped threads was added in Rust 1.63 in 2022, but can be used in edition 2015 just fine, as long as you use Rust 1.63 or later.
Unsafe Code Guidelines
Code in unsafe
blocks rely on the user to make sure the code is correct and adheres to
all the current and future rules. It is often hard or next to impossible for the compiler
to understand the correctness of unsafe code.
(Otherwise, it wouldn’t have needed to be unsafe
.)
Mistakes in unsafe code are often not easily caught by unit tests or compiler errors. So, for unsafe code specifically, it is extra important to know exactly what behavior can be relied on.
The unsafe code guidelines project is an ongoing effort to find undocumented or unclear unsafe code practices that require clarification. Eventually, this should result in more details being added to the rust reference. to cover more subtle edge cases.
Another effort to help out with gaining confidence in the correctness of unsafe code, is Miri. Miri is an experimental but very useful and powerful interpreter for Rustc’s mid-level intermediate representation. Instead of running code by compiling it to native processor instructions, it interprets code at a point when information like types and lifetimes are still available. It runs tests significantly slower than when compiled and run normally, but it is able to perform many checks and will complain when your code breaks any of the very strict rules it knows about.
A Rust Specification?
Unfortunately, we currently do not have a complete specification of the language and standard library.
However, as mentioned above, we do care deeply about stability, and will often go out of our way to avoid breaking code, even if the breakage was technically allowed by our (lack of) documentation. This doesn’t mean we don’t need a complete specification, but it does mean that it might be less of a problem for most users than one might expect.
While for many users, a specification would just be “nice to have”, there are also Rust users for whom such a specification is absolutely necessary to be able to use Rust for the field they work in. For example, safety critical Rust software for automotive or aerospace applications usually needs to pass certification, a process designed to gain confidence in the exact behavior of software through, among other things, specification and testing.
It’s hard to convince some inspection agency that a piece of Rust software will do the right thing, if there’s no specification to point at to explain what exactly a line of Rust code even means. Bugs can always exist, but all potential bugs need to be traceable back to a specification that explains the expected behavior.
The Ferrocene project is an effort led by Ferrous Systems and AdaCore to make Rust usable for the development of safety critical systems. Because Rust doesn’t yet have a language specification, they decided to start writing their own: the Ferrocene Language Specification. They purposely didn’t name it after “Rust”, to avoid implying that this is an authoritative document about Rust. For now, this is the specification of the language supported by the version of the Rust compiler that they ship as part of Ferrocene.
The draft of the Ferrocene specification is released under an open source license (MIT + Apache 2.0), which makes it possible for the Rust project to take this document as a starting point for a future official Rust specification. In fact, they’d very much like us, the Rust project, to take ownership of the specification and turn it into an official document that many parties can contribute to.
While no official decision has been made yet, there does seem to be a general agreement that we should indeed work towards having and maintaining an official complete Rust specification from within the Rust project. It’s just a lot of work, so I’m afraid we won’t get there with just some enthusiastic volunteers, even if we can use the Ferrocene specification as a start. We’ll need support and funding from the Rust Foundation and interested companies.
A Rust Standard?
So, we need a “Rust Specification”, but do we need a “Rust Standard”? What does that even mean?
While this comes down to a definition question, standardization is usually associated with the work done by a “standardization body” such as ISO or ECMA: an organisation that takes responsibility for the coordination of the evolution of a standard. These organisations have processes through which stakeholders from all over the world can participate in the evolution of the technologies they standardize.
However, in the Rust Project, we already have an open process for evolving the language, based on RFCs, team consensus, and unstable experimentation. Handing off the responsibility to a standards organisation means giving up our control, with little to no benefit. We’d lose our ability to shape the processes the way we think works best, and we might no longer be able to guarantee an open and inclusive environment that’s up to our standards.
Many companies and individuals participate in C++ standardization to influence the language; to add their own feature to the language. However, an effort towards a Rust specification is not about changing Rust. We already have a process for changing Rust, and the companies I’ve spoken to that would benefit from a Rust specification are actually not interested in changing the ways in which they can influence the evolution of the language.
It’s good that we, the Rust project itself, own the language and the process for making changes to it. We just need to get better at documenting it, and could use some help.
Would you make use of a Rust specification? Or do you have any requirements or ideas for it? I’d love to hear from you.
Leave a comment below, or join the conversation on Twitter, Reddit, Hacker News, Lobsters, or LWN.