Behind the Scenes of Rust String Formatting: format_args!()

The fmt::Arguments type is one of my favorite types in the Rust standard library. It’s not particularly amazing, but it is a great building block that is indirectly used in nearly every Rust program. This type, together with the format_args!() macro, is the power behind print!(), format!(), log::info!() and many more text formatting macros, both from the standard library and community crates.

In this blog post, we learn how it works, how it is implemented today, and how that might change in the future.

format_args!()

When you write something like:

print!("Hello {}!\n", name);

Then that print macro will expand to something like:

std::io::_print(format_args!("Hello, {}!\n", name));

The _print is an internal function that takes a fmt::Arguments as its only argument. The fmt::Arguments object is produced by the builtin format_args!() macro, which understands Rust’s string formatting syntax (with the {} placeholders, etc.). The resulting fmt::Arguments object represents both the string template, the (parsed) format string with the placeholders (in this case: "Hello, <argument 1 here>\n"), and references to the arguments (in this case just one: &name).

Because it is a macro, the parsing of the format string is done at compile time. (Unlike e.g. printf in C, which can parse and process % placeholders at runtime.)

This means that the format_args!() macro gives a compiler error if the placeholders and arguments don’t match up, for example.

It also means that it can turn the string template into a representation that’s easy to process at runtime. For example: [Str("Hello, "), Arg(0), Str("!\n")]. Expanding format_args in our print example would then result in something like this:

std::io::_print(
    // Simplified expansion of format_args!():
    std::fmt::Arguments {
        template: &[Str("Hello, "), Arg(0), Str("!\n")],
        arguments: &[&name as &dyn Display],
    }
);

This gets a bit more complicated when a mix of different formatting traits (e.g. Display, Debug, LowerHex, etc.) or flags (e.g. {:02x}, {:.9}, {:#?}, etc.) are involved, but the general idea is the same.

The _print function doesn’t know much about this fmt::Arguments type. It only contains an implementation of fn write_str(&mut self, &str) to write a &str to standard output, and the fmt::Write trait will conveniently add a fn write_fmt(&mut self, fmt::Arguments) method. Calling this write_fmt method with an fmt::Arguments object will result in a series of calls to write_str to produce the formatted output.

The provided write_fmt method simply calls std::fmt::write(), which is the only function that knows how to “execute” the formatting instructions contained in a fmt::Arguments type. It calls write_str for the static parts of the template, and it will call the right Display::fmt (or LowerHex::fmt, etc.) functions for the arguments, which will also result in calls to write_str down the line.

Usage example

What we just learned, is that all you need to do to make use of the powers of Rust’s string formatting, is to provide a Write::write_str implementation for your type.

For example:

pub struct Terminal;

impl std::fmt::Write for Terminal {
    fn write_str(&mut self, s: &str) -> std::fmt::Result {
        write_to_terminal(s.as_bytes());
        Ok(())
    }
}

That’s all that’s necessary to make this work, right away:

Terminal.write_fmt(format_args!("Hello, {name}!\n"));

This will result in a series of calls to your write_str function: write_str("Hello, "), write_str(name), and write_str("!\n").

And, thanks to the write macro, that line can also be conveniently written as:

write!(Terminal, "Hello, {name}!\n");

In other words, you don’t even need to know about the existence of fmt::Arguments or format_args!() at all, even for adding formatting functionality to your own types: just implement Write::write_str, and the write!() macro just works!

Implementation details

What makes this all so exciting to me, is that the implementation details of fmt::Arguments are entirely private. As developers of the Rust standard library, we can change pretty much anything about how fmt::Arguments represents its data internally, as long as we update format_args!() and std::fmt::Write accordingly. Everything that uses formatting, from dbg!(x) to log::info!("n = {n}"), and even our example above, wouldn’t even notice that their underlying building block has changed, while still benefiting from the potential improvements.

So the question that’s been bothering me for years now is: what is the most efficient, smallest, most performant, fastest to compile, and overall best implementation of fmt::Arguments?

I don’t have the answer yet.

I don’t think there is a single answer, as there are many trade-offs involved. What I am pretty certain about, is that our current implementation is not it. It is quite okay, but there are many ways in which it can be improved.

Current implementation

Today’s implementation looks like this:

pub struct Arguments<'a> {
    pieces: &'a [&'static str],
    placeholders: Option<&'a [Placeholder]>,
    args: &'a [Argument<'a>],
}

The pieces field contains the literal string pieces from the template. For example, for format_args!("a{}b{}c"), this is &["a", "b", "c"].

The placeholders ({}) between those pieces are listed in the placeholders field, which contain the formatting options and the index of the argument to be formatted:

pub struct Placeholder {
    argument: usize, // index into `args`
    fill: char,
    align: Alignment,
    flags: u32,
    precision: Count,
    width: Count,
}

In a complicated example such as format_args!("a{1:>012}b{1:-5}c{0:#.1}"), all these fields are very relevant.

However, in many situations, like in format_args!("a{}b{}c{}"), the placeholders are just the arguments in order, with only default flags and other settings. That’s why the placeholders field is an Option: this common situation is represented by None, saving some storage space.

Finally, the args field contains the arguments to be formatted. The arguments could have been stored &dyn Display, weren’t it that we also have to support Debug, LowerHex, and the other display traits.

So, instead, we use a custom Argument type that behaves almost exactly like a &dyn Display. It is implemented as two pointers: a reference to the argument itself, and a function pointer to the corresponding Display::fmt (or Debug::fmt, etc.) implementation.

This means that when an argument is used through two different traits, it is stored twice in args. For example, format_args!("{0} {0:?} {1:x}", a, b) results in:

fmt::Arguments {
    pieces: &["", " ", " "],
    placeholders: None,
    args: &[
        fmt::Argument::new(&a, Display::fmt),
        fmt::Argument::new(&a, Debug::fmt),
        fmt::Argument::new(&b, LowerHex::fmt),
    ],
}

However, When an argument is used twice through the same trait, but with different flags, it only appears once in args. For example, format_args!("{0:?} {0:#?}", a), which formats a as Debug both without and with the “alternate” flag enabled, expands to:

fmt::Arguments {
    pieces: &["", " "],
    placeholders: Some(&[
        fmt::Placeholder { argument: 0, ..default() },
        fmt::Placeholder { argument: 0, flags: 4 /* alternate */, ..default() },
    ]),
    args: &[
        fmt::Argument::new(&a, Debug::fmt),
    ],
}

The fmt::Arguments type is designed to make as much of the data as possible const promotable. The pieces and placeholders fields only refer to constant data that can be put in a static place, such that those first two &[] are just &'static [] in practice. Only the array for the args field, which is purposely kept as small as possible, needs to be constructed at runtime. That’s the only array that contains the non-static data: references to the arguments themselves.

While the current design has a few great properties, there are several problems with today’s implementation of fmt::Arguments. Or at least opportunities for improvement. Let’s talk about a few of the most interesting ones.

Structure size

First of all, format_args!("a{}b{}c{}d") expands to something containing &["a", "b", "c", "d"]. That means that these four bytes now take up the size of 10 additional pointers: 80 bytes overhead on a 64-bit system! (Each &str is a pointer and a length, and the outermost &[] is also stored as a pointer and a length.) As you can imagine, many real world uses of formatting macros result in small string pieces like " " and "\n", each adding another 16 bytes of overhead, just for one byte!

On top of that, if we add any flag or other formatting option to any of the placeholders, even just a single one, the placeholders field switches from None to Some(&[…]) containing information for all placeholders. For example, format_args!("{a}{b}{c}{d:#}{e}{f}{g}") will expand to:

fmt::Arguments {
    pieces: &["", " "],
    placeholders: Some(&[
        fmt::Placeholder { argument: 0, ..default() },
        fmt::Placeholder { argument: 1, ..default() },
        fmt::Placeholder { argument: 2, ..default() },
        fmt::Placeholder { argument: 3, flags: 4 /* alternate */, ..default() },
        fmt::Placeholder { argument: 4, ..default() },
        fmt::Placeholder { argument: 5, ..default() },
        fmt::Placeholder { argument: 6, ..default() },
    ]),
    args: &[
        fmt::Argument::new(&a, Debug::fmt),
        fmt::Argument::new(&b, Debug::fmt),
        fmt::Argument::new(&c, Debug::fmt),
        fmt::Argument::new(&d, Debug::fmt),
        fmt::Argument::new(&e, Debug::fmt),
        fmt::Argument::new(&f, Debug::fmt),
        fmt::Argument::new(&g, Debug::fmt),
    ],
}

If the fourth placeholder didn’t have a # flag, placeholders would have been None. This means that that one flag has a storage overhead of seven times a Placeholder struct, which on 64-bit platforms adds up to a total of almost 400 bytes!

Even if we don’t care about the size of static storage, the fmt::Arguments object itself is also a bit larger than necessary. It contains three references slices, three times a pointer plus a size, which is 48 bytes on a 64-bit platform. Passing a fmt::Arguments around would be much more efficient if it was just one or two pointers in size.

Code size

If you care about (static) storage size, you most definitely also care about code size.

A problem with the design of the display traits, is that a single trait implementation is used for many different flags and option. That is, while Display for i32 doesn’t have to support hexadecimal formatting (which is left to LowerHex for i32), it does have to support options like alignment, a fill character, plus and minus signs, zero padding, and so on.

So, a simple println!("{}", some_integer) will create a fmt::Arguments containing a pointer to <i32 as Display>::fmt, which includes support for all those extra options we’re not using. Ideally, the compiler would be smart enough to see if a Rust program never uses any of those options, and optimize those parts away entirely.

However, thanks to fmt::Arguments, that is a really difficult job: there is several layers of indirection through &dyn Write, Argument and function pointers. Which is not something the compiles is able to efficiently optimize all the way through.

This means that write!(x, "{}", some_str), which could have been optimized to just a x.write_str(some_str) call, will instead result in the full <str as Display>::fmt implementation being pulled in, which pulls in support for padding and alignment, which in turn pulls in support for counting UTF-8 codepoints and encoding UTF-8. A lot of unnecessary code!

This is a big problem for embedded projects, resulting in many embedded Rust developers avoiding formatting entirely.

Runtime overhead

When you have the luxury of not caring about code size and static storage size, you probably still care about runtime performance.

As mentioned before, the fmt::Arguments structure is designed to put as much of the data as possible into static storage, to make it as cheap as possible to construct a fmt::Arguments object at runtime. To construct such an object today, you have to create the args array containing both the pointers to the arguments and the function pointers, followed by the fmt::Arguments itself with references to the static data (the string pieces and placeholder descriptions) and a reference to the args array.

While the pointers to the arguments and the address of the args array itself might change at runtime, everything else never changes. For example, even though the length of the arrays is constant, they still have to be written into the fmt::Argumemts structure at runtime as part of the three &[] fields. On top of that, half of the data inside the args array is constant: the pointers to the arguments might change, but the function points are constant.

So, as an example constructing a format_args!("{a}{b}{c}") today means initializing an args array with pointers to a, b and c and function pointers to their formatting functions, and initializing fmt::Arguments containing three wide pointers (pointer + size), a grand total of 12 words (pointers or usizes) worth of data to be written, every single time the expression is executed.

In an ideal world, fmt::Arguments could just be two pointers in size: one pointer to all the static data (string pieces, placeholders, and function pointers), and one pointer to the args array only containing the pointers to the arguments. For our example, this adds up to a total of only 4 pointers to be written. 75% savings!

Ideas

So, how can we improve things?

Let’s start with a few ideas.

Closures

One way to look at a fmt::Arguments object, is to see it simply as a “list of instructions”. For example, format_args!("Hello {}\n{:#}!") comes down to: write "Hello ", display the first argument with default flags, write a newline, display the second argument with the alternate flag, write "!", done.

And what is the most obvious way to represent a list of instructions in Rust? A series of commands, or statements? That’s right, a function, or closure.

So, what if we were to expand format_args!("Hello {}\n{:#}!") to something like this?

fmt::Arguments::new(|w| {
    w.write_str("Hello ")?;
    Display::fmt(&arg1, Formatter::new(w))?;
    w.write_str("\n")?;
    Display::fmt(&arg2, Formatter::new(w).alternate(true))?;
    w.write_str("!")?;
    Ok(())
})

If we do that, then std::fmt::write would be trivial: just call the closure!

fmt::Arguments would just contain a &dyn Fn, which is just two pointers in size: one to the function itself, and one to its captured arguments. Perfect!

And most importantly: the compiler can now easily inline and optimize the Display::fmt implementations, stripping out all the code for unused flags!

This sounds almost too good to be true.

I implemented this, but unfortunately, it turns out that while this can greatly improve the binary size of tiny embedded programs, this approach is disastrous for both compilation time and binary size of larger programs.

And this makes sense: a program with lots of print/write/format statements will suddenly have a ton of extra function bodies that all need to be optimized. And while inlining the Display::fmt functions can reduce overhead for a single print statement, it results in code size blowing up when you have a lot of print statements.

Display::simple_fmt

If we want to avoid pulling in unnecessary formatting (aligning, padding, etc.) code without trying to #[inline] everything, we need to take a more precise approach.

Next to the fmt method, the display traits could have an additional method—let’s call it simple_fmt—that does the same thing, but may assume default formatting options. For example, while <&str as Display>::fmt needs to support padding and alignment (and therefore UTF-8 decoding and counting), <&str as Display>::simple_fmt would be implemented as just a single line: f.write_str(s).

Then we can update format_args!() to use simple_fmt instead of fmt whenever no flags were used, to avoid pulling in unnecessary code.

I implemented this idea as well. And it works great: it reduced a 6KiB benchmark program to less than 3KiB!

Unfortunately, if your program uses &dyn Display anywhere, this change makes things slightly worse: the vtable for the display traits grew one entry to also contain simple_fmt.

There are ways to avoid that, but those come with other complexity and limitations.

Merging the pieces and placeholders

Today’s structure contains three fields: the string pieces, the placeholder descriptions (with flags, etc.), and the arguments. The first two are always constant, static. Can we combine those?

What if fmt::Arguments looked something like this?

pub struct Arguments<'a> {
    template: &'a [Piece<'a>],
    argument: &'a [Argument<'a>],
}

enum Piece<'a> {
    String(&'static str),
    Placeholder {
        argument: usize,
        options: FormattingOptions,
    },
}

Then, format_args!("> {a}{b} {c}!") would expand to something like:

Arguments {
    template: &[
        Piece::String("> "),
        Piece::Placeholder { argument: 0, options: FormattingOptions::default() },
        Piece::Placeholder { argument: 1, options: FormattingOptions::default() },
        Piece::String(" "),
        Piece::Placeholder { argument: 2, options: FormattingOptions::default() },
        Piece::String("!"),
    ],
    arguments: &[
        Argument::new(&a, Display::fmt),
        Argument::new(&b, Display::fmt),
        Argument::new(&c, Display::fmt),
    ],
}

This reduces the size of fmt::Arguments from 3 to 2 wide pointers (from 6 to 4 words), and avoids needing empty string pieces between adjacent placeholders.

As an alternative for the placeholders: None optimization, for the case where all arguments are formatted in order with default options (like in the example above), we could add a rule that two consecutive Piece::String elements results in an implicit placeholder, since there is no reason for two consecutive Piece::Strings otherwise.

With that rule, format_args!("> {a}{b} {c}!") could expand to something like:

Arguments {
    template: &[
        Piece::String("> "),
        Piece::String(""), // Implicit placeholder for argument 0 above.
        Piece::String(" "), // Implicit placeholder for argument 1 above.
        Piece::String("!"), // Implicit placeholder for argument 2 above.
    ],
    args: &[
        Argument::new(&a, Display::fmt),
        Argument::new(&b, Display::fmt),
        Argument::new(&c, Display::fmt),
    ],
}

Which at first glance looks exactly as efficient as the old expansion (with pieces: &["> ", "", " ", "!"]), but actually takes up far more space. Each Piece element is far bigger than just a &str, since that enum needs the space to contain a Piece::Placeholder with all the formatting options as well.

So, while this might reduce the runtime overhead somewhat (by making fmt::Arguments itself smaller), it might result in more static data, resulting in larger binary sizes.

List of instructions

Iterating on the previous idea: we don’t have to make a string piece take as much space as a placeholder with all the possible flags.

We can reduce the size of the enum by spreading a placeholder out over multiple entries.

For example, instead of:

[
    Piece::String("> "),
    Piece::Placeholder {
        argument: 0,
        options: FormattingOptions { alternate: true, … }
    },
]

We could have:

[
    Piece::String("> "),
    Piece::SetAlternateFlag,
    Piece::Placeholder { argument: 0 },
]

Now, a placeholder with flags set will cost multiple entries, but placeholders with default settings will not pay the price of storing all their flags.

What we have effectively created here, is a small ‘assembly language’ for formatting, with only a few instructions: writing a string, setting flags, and calling formatting functions of arguments.

Arguments {
    instructions: &[
        // A list of instructions in our imaginary 'formatting assembly language':
        Instruction::WriteString("> "),
        Instruction::DisplayArg(0),
        Instruction::WriteString(" "),
        Instruction::SetAlternateFlag,
        Instruction::SetSign(Sign::Plus),
        Instruction::DisplayArg(1),
    ],
    args: &[
        Argument::new(&a, Display::fmt),
        Argument::new(&b, Display::fmt),
        Argument::new(&c, Display::fmt),
    ],
}

If we go down this road bit further, we could even come up with a more efficient “instruction encoding” for our formatting commands, leading to many interesting design decisions and trade-offs.

Null terminated slices

The “list of instructions” idea reduces fmt::Arguments to just two wide pointers (pointer + size), and provides some ideas for reducing the static storage size. But can we reduce the fmt::Arguments struct itself further, to reduce the runtime overhead of creating (initializing) one?

Can we get rid of the sizes of the slices? Instead of wide pointers (&[]), can we only store the start address but not the size, and still make it work?

For the list of instructions, we could add Instruction::End to the end of the list so we know where to stop. If we can (unsafely) assume that the lists always end in an “end” instruction, we no longer need to store the amount of instructions in fmt::Arguments.

Furthermore, as long as we can (unsafely) assume that the Instruction::DisplayArg instructions never refer to an argument out of bounds, we can also get rid of the size of args.

This can reduce the size of fmt::Arguments to just two pointers!

Static function pointers

And for our last trick of today, let’s see if we can reduce the size of the args arrays. As mentioned above, it stores both the pointers to the arguments (which are not static) and the Display::fmt function pointers (which are static).

Ideally, we’d reduce runtime overhead by moving the function pointers out of args and into instructions. (Perhaps into Instruction::DisplayArg.)

So, instead of:

Arguments {
    instructions: &[
        Instruction::DisplayArg(0),
        Instruction::DisplayArg(1),
        Instruction::DisplayArg(2),
    ],
    args: &[
        Argument::new(&a, Display::fmt),
        Argument::new(&b, Display::fmt),
        Argument::new(&c, Display::fmt),
    ],
}

format_args!("{a}{b}{c}") would expand to:

Arguments {
    instructions: &[
        Instruction::DisplayArg(0, <… as Display>::fmt),
        Instruction::DisplayArg(1, <… as Display>::fmt),
        Instruction::DisplayArg(2, <… as Display>::fmt),
    ],
    args: &[
        Argument::new(&a),
        Argument::new(&b),
        Argument::new(&c),
    ],
}

Which would cut the storage size of args in half!

Looks easy enough, but there is one problem we run into: the expansion can no longer rely on the generic signature of Argument::new to magically pick the right Display::fmt for each of the arguments, and must instead spell out <T as Display>::fmt explicitly for each of the argument types.

But format_args!() is just a macro, it doesn’t and can’t know the types of the arguments at expansion time, and Rust currently has no syntax like <typeof(a) as Display>::fmt.

Working around this problem is possible but surprisingly tricky! :(

Stuck?

As is clear by now, there are many possible ideas for improvement. Some of those combine well, but many ideas are mutually exclusive. We’ve also skipped some of the trickier details, such as support for Arguments::as_str().

As with any design problem, every possible change has both pros and cons. Like we’ve seen, an implementation that is great for binary size or runtime performance can be disastrous for compilation time, for example.

But the design problem isn’t the only problem.

What makes it extremely hard to improve fmt::Arguments, is the amount of energy it takes just to make a change to its implementation. One has to not only change the fmt::Arguments type, but also rewrite the builtin format_args!() macro (which involves changing rustc itself), and update the fmt::write implementation. And because of how Rust is bootstrapped, the standard library is compiled with both the previous and current version of the compiler, so you need to make sure your changes to the standard library remain compatible with the unmodified format_args!() macro of the previous compiler, which results in #[cfg(bootstrap)] hell. Changing the format_args builtin macro is (or used to be) quite challenging, as it is not only responsible for generating the fmt::Arguments expression, but also for generating diagnostics resulting from invalid format strings. Once you worked yourself through all that, you will find out that Clippy breaks with your changes; it depends the implementation details of fmt::Arguments, because it looks at the code after macro expansion.

So, even a tiny change to fmt::Arguments not only involves modifying the standard library, but also required you to be an expert on Rustc builtin macros, Rustc diagnostics, bootstrapping, and Clippy’s internals. Your change would touch many different moving parts all at once, spread over two repositories, and would require approval from a combination of several different reviewers.

This is exactly the recipe for something to get stuck forever.

Small steps

The way to get things unstuck, is to make small steps, removing blockers one by one. This can be somewhat exhausting, as the rewards (like actual performance improvements) won’t come quickly. But if you’re a fan of very long to-do lists (that’s me), you’ll have a field day. :)

Just like I did when I was working on std’s locks, I’ve made a tracking issue to keep track of everything related to improving fmt::Arguments: https://github.com/rust-lang/rust/issues/99012

As you can see in the to do list there, there have already been many changes, even though almost nothing has changed about how fmt::Arguments works (yet!). The changes so far play a big role in making it much easier to make improvements later, in part by paying off technical debt.

For example, in one of the changes, I refactored the format_args builtin macro to separate the parsing, resolving, diagnostic generating, and expansion steps (like a mini compiler), which allows one to change the expansion without knowing anything about the other steps.

Then I proposed making the expansion of format_args macro a bit more magic, effectively delaying the expansion step until a bit later. This unlocked some really cool optimizations (which already shipped as part of Rust 1.71), and also allows Clippy to access the unexpanded information so it can finally stop depending on implementation details, such that it won’t break if the expansion changes.

That change was again mostly self-contained and reviewable by one person, as the details that the other moving parts rely on (e.g. Clippy) remained (mostly) unchanged. After that was merged, the next step was to migrate Clippy away from using the implementation details, using the newly available information instead, which is tracked in yet another tracking issue.

And even after all that work is done too, fmt::Arguments still hasn’t changed! But we’re now getting to the point where most of the yak is shaved, finally allowing us to make improvements without having to go down yet another rabbit hole.

What’s next?

Most of the exciting results are still to come, as the changes for the big improvements are still slowly getting unblocked (and I’ve been busy with other things), but there are already a few interesting things to be excited about:

As of rust 1.71, nested format_args calls are flattened. For example, format_args!("[{}] {}", 123, format_args!("error: {}", msg)) is now exactly equivalent to format_args!("[123] error: {}", msg). This means that macros like dbg!() now have far less overhead.

This can result in big improvements, even for large programs. For example, hyper improved about 2-3% in both compilation time and binary size.
Optimizing for the placeholder-without-flags case will allow removal of unused formatting code, which can result in more than 50% binary size reduction for tiny programs.
An improved fmt::Arguments representation not only reduces runtime overhead, but will also allow for fmt::Arguments::from_str() to exist.

But that’s nothing compared to the potential improvements we can get in the future by finally implementing some of the larger ideas. Implementing those ideas used to be exceptionally difficult, but that has changed and is still getting better:

Most importantly, changing format_args!() and fmt::Arguments no longer requires changes in rustc diagnostics or Clippy, thanks to all the refactoring. It still requires modifying a rustc builtin macro, but that’s now a lot easier to do.
Currently, the standard library is compiled with both the old and new rustc which makes changes to fmt::Arguments verbose (with a lot of #[cfg(bootstrap)]), but the plan to change stage 0 in Rustc bootstrapping will, once someone steps up to implement it, fix that problem entirely, making it far easier to make changes to builtin macros.
The Rust compiler performance tracker now also tracks binary size, such that binary size improvements from fmt::Arguments changes are now much easier to track.
There is now an official Binary Size Working Group with a large interest in making formatting code smaller.

In other words: we can expect many exciting improvements in the future, and we are getting there, one step at a time.

Thoughts? Comments? Ideas?

Feel free to drop a comment below or on GitHub, or join the discussion on Reddit, Hacker News, Lobsters, Twitter, or Mastodon!

Contents