Writing Python inside your Rust code — Part 1

About a year ago, I published a Rust crate called inline-python, which allows you to easily mix some Python into your Rust code using a python!{ .. } macro. In this series, I’ll go through the process of developing this crate from scratch.

Sneak preview

If you’re not familiar with the inline-python crate, this is what it allows you to do:

fn main() {
    let who = "world";
    let n = 5;
    python! {
        for i in range('n):
            print(i, "Hello", 'who)
        print("Goodbye")
    }
}

It allows you to embed Python code right between your lines of Rust code. It even allows you to use your Rust variables inside the Python code.

We’ll start with a much simpler case, and slowly work our way up to this result (and more!).

Running Python code

First, let’s take a look at how we can run Python code from Rust. Let’s try to make this first simple example work:

fn main() {
    println!("Hello ...");
    run_python("print(\"... World!\")");
}

We could implement run_python by using std::process::Command to run the python executable and pass it the Python code, but if we ever expect to be able to define and read back Python variables, we’re probably better off if we start by using the PyO3 library instead.

PyO3 gives us Rust bindings for Python. It nicely wraps the Python C API, letting us interact with all kind of Python objects directly from Rust. (And even make Python libraries in Rust, but that’s a whole other topic.)

Its Python::run function looks exactly like what we need. It takes the Python code as a &str, and allows us to define any variables in scope using two optional PyDicts. Let’s give it a try:

fn run_python(code: &str) {
    let py = pyo3::Python::acquire_gil(); // Acquire the 'global interpreter lock', as Python is not thread-safe.
    py.python().run(code, None, None).unwrap(); // No locals, no globals.
}

$ cargo run
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
     Running `target/debug/scratchpad`
Hello ...
... World!

Success!

Rule based macros

Writing inside a string literal is not the most convenient way to write Python, so let’s see if we can improve that. Macros allow us to define custom syntax within Rust, so let’s try to use one:

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
    }
}

Macros are normally defined using using macro_rules!, which lets you define a macro using advanced ‘find and replace’ rules based on things like tokens and expressions. (See the chapter on macros in the Rust Book for an introduction to macro_rules!. See The Little Book of Rust Macros for all the scary details.)

Macros defined by macro_rules! can not execute any code at compile time, they are only applying replacement rules based on patterns. Great for things like vec![], and even lazy_static!{ .. }, but not powerful enough for things such as parsing and compiling regular expressions (e.g. regex!("a.*b")).

In the matching rules of a macro, we can match on things like expressions, identifiers, types, and many other things. Since ‘valid Python code’ is not an option, we’ll just make our macro accept anything: raw tokens, as many as needed:

macro_rules! python {
    ($($code:tt)*) => {
        ...
    }
}

(See the resources linked above for details on how macro_rules! works.)

An invocation of our macro should result in run_python(".."), with all Python code wrapped in that string literal. We’re luckily: there’s a builtin macro that puts things in a string for us, called stringify!.

macro_rules! python {
    ($($code:tt)*) => {
        run_python(stringify!($($code)*));
    }
}

Let’s try!

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.32s
     Running `target/debug/scratchpad`
Hello ...
... World!

Success!

But wait, what happens if we have more than one line of Python code?

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
        print("Bye.")
    }
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/scratchpad`
Hello ...
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { type: Py(0x7f1c0a5649a0, PhantomData) }', src/main.rs:9:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Oof, that’s unfortunate.

To debug this, let’s properly print the PyErr, and also show the exact Python code we’re feeding to Python::run:

fn run_python(code: &str) {
    println!("-----");
    println!("{}", code);
    println!("-----");
    let py = pyo3::Python::acquire_gil();
    if let Err(e) = py.python().run(code, None, None) {
        e.print(py.python());
    }
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/scratchpad`
Hello ...
-----
print("... World!") print("Bye.")
-----
  File "<string>", line 1
    print("... World!") print("Bye.")
                        ^
SyntaxError: invalid syntax

Apparently both lines of Python code ended up on the same line, and the Python rightfully complains about this being invalid syntax.

And now we’ve stumbled across the biggest problem we’ll have to overcome: stringify! messes up white-space.

White-space and tokens

Let’s take a closer look at what stringify! does:

fn main() {
    println!("{}", stringify!(
        a 123    b   c
        x ( y + z )
        // comment
        ...
    ));
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s
     Running `target/debug/scratchpad`
a 123 b c x(y + z) ...

Not only does it remove all unnecessary white-space, it even removes comments. The reason is that we’re working with tokens here, not the original source code: a, 123, b, etc.

One of the first things rustc does, is to tokenize the source code. This makes it easier to do the rest of the parsing, not having to deal with individual characters like 1, 2, 3, but only with tokens such as ‘integer literal 123’. Also, white-space and comments are gone after tokenizing, as they are meaningless for the compiler.

stringify!() is a way to convert a bunch of tokens back to a string, but on a ‘best effort’ basis: It will convert the tokens back to text, and only insert spaces around tokens when needed (to avoid turning b, c into bc).

So this is a bit of a dead end. Rustc has carelessly thrown our precious white-space away, which is very significant in Python.

We could try to have some code guess which spaces have to be replaced back by newlines, but indentation is definitely going to be a problem:

fn main() {
    let a = stringify!(
        if False:
            x()
        y()
    );
    let b = stringify!(
        if False:
            x()
            y()
    );
    dbg!(a);
    dbg!(b);
    dbg!(a == b);
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.20s
     Running `target/debug/scratchpad`
[src/main.rs:12] a = "if False : x() y()"
[src/main.rs:13] b = "if False : x() y()"
[src/main.rs:14] a == b = true

The two snippets of Python code have a different meaning, but stringify! gives us the same result for both.

Before giving up, let’s try the other type of macros.

Procedural macros

Rust’s procedural macros are another way to define macros. Whereas macro_rules! can only define ‘function-style macros’ (those with an !), procedural macros can also define custom derive macros (e.g. #[derive(Stuff)]) and attribute macros (e.g. #[stuff]).

Procedural macros are implemented as a compiler plugin. You get to write a function that gets access to the token stream the compiler sees, can do whatever it wants, and then needs to return a new token stream which the compiler will use instead (or in addition, in the case of a custom derive):

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    todo!()
}

That TokenStream there doesn’t predict anything good. We need the original source code, not just the tokens. But let’s just continue anyway. Maybe a procedural macro gives us more flexibility to hack our way around any problems.

Because procedural macros run Rust code as part of the compilation process, they need to go in a separate proc-macro crate, which is compiled before you can compile anything that uses it.

$ cargo new --lib python-macro
     Created library `python-macro` package

In python-macro/Cargo.toml:

[lib]
proc-macro = true

In Cargo.toml:

[dependencies]
python-macro = { path = "./python-macro" }

Let’s start with an implementation that just panics (todo!()), after printing the TokenStream:

// python-macro/src/lib.rs
extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    dbg!(input.to_string());
    todo!()
}

// src/main.rs
use python_macro::python;

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
        print("Bye.")
    }
}

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0
error[E0658]: procedural macros cannot be expanded to statements
 --> src/main.rs:5:5
  |
5 | /     python! {
6 | |         print("... World!")
7 | |         print("Bye.")
8 | |     }
  | |_____^
  |
  = note: see issue #54727 <https://github.com/rust-lang/rust/issues/54727> for more information
  = help: add `#![feature(proc_macro_hygiene)]` to the crate attributes to enable

Whelp, what happened here?

Rust complains that ‘procedural macros cannot be expanded to statements’, and something about enabling ‘hygienic macros’. Macro hygiene is the wonderful feature of Rust macros to not accidentally ’leak’ any names to the outside world (or the reverse). If a macro expands to code that uses some temporary variable named x, it will be separate from any x that appears in any code outside of the macro.

However, this feature isn’t stable yet for procedural macros. The result is that procedural macros are not (yet) allowed to appear in any place other than as a item by itself (e.g. at file scope, but not inside a function).

There exists a very ~~horrible~~ fascinating workaround for this, but let’s just enable the experimental #![feature(proc_macro_hygiene)] and continue our adventure.

(If you are reading this in the future, when proc_macro_hygiene has been stabilized: You could’ve skipped the last few paragraphs. ^^)

$ sed -i '1i#![feature(proc_macro_hygiene)]' src/main.rs
$ cargo r
   Compiling scratchpad v0.1.0
[python-macro/src/lib.rs:6] input.to_string() = "print(\"... World!\") print(\"Bye.\")"
error: proc macro panicked
 --> src/main.rs:6:5
  |
6 | /     python! {
7 | |         print("... World!")
8 | |         print("Bye.")
9 | |     }
  | |_____^
  |
  = help: message: not yet implemented

error: aborting due to previous error

error: could not compile `scratchpad`.

Our procedural macro panics as expected, after showing us the input it got as string:

print("... World!") print("Bye.")

Again, as expected, with the white-space thrown away. :(

Time to give up.

Or.. Maybe there’s a way to work around this.

Reconstructing white-space

Although rustc only works with tokens while parsing en compiling, it somehow still knows exactly where to point when it has errors to report. There’s no newlines left in the tokens, but it still knows our error happened on lines 6 through 9. How?

It turns out that tokens contain quite a bit of information. They contain a Span, which is basically the start and end location of the token in the original source file. The Span can tell which file, line, and column number a token starts and ends at.

If we can get to this information, we can reconstruct the white-space by putting spaces and newlines between tokens to match their line and column information.

Functions that give us this information are not yet stable and gated behind #![feature(proc_macro_span)]. Let’s enable it, and see what we get:

#![feature(proc_macro_span)]

extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    for t in input {
        dbg!(t.span().start());
    }
    todo!()
}

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 7,
    column: 8,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 7,
    column: 13,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 8,
    column: 8,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
    line: 8,
    column: 13,
}

Nice! We got some numbers.

But there’s only four tokens? It turns out ("... World!") appears one token here, and not three ((, "... World!", and )). If we look at the documentation of TokenStream, we can see it doesn’t give us a stream of tokens, but of token trees. Apparently Rust’s tokenizer already matches parentheses (and braces and brackets) and doesn’t just give a linear list of tokens, but a tree of tokens. Tokens inside parentheses will be children of a single Group token.

Let’s modify our procedural macro to recursively go over all the tokens inside groups as well (and improve the output a bit):

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    print(input);
    todo!()
}

fn print(input: TokenStream) {
    for t in input {
        if let TokenTree::Group(g) = t {
            println!("{:?}: open {:?}", g.span_open().start(), g.delimiter());
            print(g.stream());
            println!("{:?}: close {:?}", g.span_close().start(), g.delimiter());
        } else {
            println!("{:?}: {}", t.span().start(), t.to_string());
        }
    }
}

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0
LineColumn { line: 7, column: 8 }: print
LineColumn { line: 7, column: 13 }: open Parenthesis
LineColumn { line: 7, column: 14 }: "... World!"
LineColumn { line: 7, column: 26 }: close Parenthesis
LineColumn { line: 8, column: 8 }: print
LineColumn { line: 8, column: 13 }: open Parenthesis
LineColumn { line: 8, column: 14 }: "Bye."
LineColumn { line: 8, column: 20 }: close Parenthesis

Wonderful!

Now to reconstruct the white-space, we need to insert newlines if we’re not on the right line yet, and spaces if we’re not in the right column yet. Let’s see:

#![feature(proc_macro_span)]

extern crate proc_macro;
use proc_macro::{TokenTree, TokenStream, LineColumn};

#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
    let mut s = Source {
        source: String::new(),
        line: 1,
        col: 0,
    };
    s.reconstruct_from(input);
    println!("{}", s.source);
    todo!()
}

struct Source {
    source: String,
    line: usize,
    col: usize,
}

impl Source {
    fn reconstruct_from(&mut self, input: TokenStream) {
        for t in input {
            if let TokenTree::Group(g) = t {
                let s = g.to_string();
                self.add_whitespace(g.span_open().start());
                self.add_str(&s[..1]); // the '[', '{' or '('.
                self.reconstruct_from(g.stream());
                self.add_whitespace(g.span_close().start());
                self.add_str(&s[s.len() - 1..]); // the ']', '}' or ')'.
            } else {
                self.add_whitespace(t.span().start());
                self.add_str(&t.to_string());
            }
        }
    }

    fn add_str(&mut self, s: &str) {
        // Let's assume for now s contains no newlines.
        self.source += s;
        self.col += s.len();
    }

    fn add_whitespace(&mut self, loc: LineColumn) {
        while self.line < loc.line {
            self.source.push('\n');
            self.line += 1;
            self.col = 0;
        }
        while self.col < loc.column {
            self.source.push(' ');
            self.col += 1;
        }
    }
}

Fingers crossed..

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0






        print("... World!")
        print("Bye.")
error: proc macro panicked

Okay, that works, but what’s with all the extra newlines and spaces? Oh right, the first token starts at line 7 column 8, so it correctly puts print on line 7 in column 8. The location we’re looking at is the exact location in the .rs file.

The extra newlines at the start are not a problem (empty lines have no effect in Python). It even has a nice side effect: When Python reports an error, the line number it reports will match the line number in the .rs file.

However, the 8 spaces are a problem. Although the Python code inside our python!{..} is properly indented with respect to our Rust code, the Python code we extract should start at a ‘zero’ indentation level. Otherwise Python will complain about invalid indentation.

Let’s subtract the column number of the first token from all column numbers:

    start_col: None,
    // <snip>
    start_col: Option<usize>,
    // <snip>
    let start_col = *self.start_col.get_or_insert(loc.column);
    let col = loc.column.checked_sub(start_col).expect("Invalid indentation.");
    while self.col < col {
        self.source.push(' ');
        self.col += 1;
    }
    // <snip>

$ cargo r
   Compiling python-macro v0.1.0
   Compiling scratchpad v0.1.0






print("... World!")
print("Bye.")
error: proc macro panicked

Awesome!

Now we only have to turn this string into a string literal token and put run_python(); around it:

    TokenStream::from_iter(vec![
        TokenTree::from(Ident::new("run_python", Span::call_site())),
        TokenTree::Group(Group::new(
            Delimiter::Parenthesis,
            TokenStream::from(TokenTree::from(Literal::string(&s.source))),
        )),
        TokenTree::from(Punct::new(';', Spacing::Alone)),
    ])

Ugh, working with token trees is horrible. Especially making trees and streams from scratch.

If only there was a way to just write the Rust code we want to produce and— Ah yes, the quote! macro from the quote crate:

    let source = s.source;
    quote!( run_python(#source); ).into()

Okay, that’s better.

Now to test it using our original run_python function:

#![feature(proc_macro_hygiene)]
use python_macro::python;

fn run_python(code: &str) {
    let py = pyo3::Python::acquire_gil();
    if let Err(e) = py.python().run(code, None, None) {
        e.print(py.python());
    }
}

fn main() {
    println!("Hello ...");
    python! {
        print("... World!")
        print("Bye.")
    }
}

$ cargo r
   Compiling scratchpad v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/scratchpad`
Hello ...
... World!
Bye.

Success!

🎉

Turning this into a library

Now to turn this into a reusable library, we:

Remove fn main,
Rename main.rs to lib.rs,
Give the crate a good name, like inline-python,
Make run_python public,
Change the run_python call in the quote!() to inline_python::run_python, and
Add pub python_macro::python; to re-export the python! macro from this crate.

What’s next

There’s probably tons of things to improve and plenty of bugs to discover, but at least we can now run snippets of Python in between our lines of Rust code.

The biggest problem for now is that this isn’t very useful yet, since no data can (easily) cross the Rust-Python border.

In part 2, we’ll take a look at how we can make Rust variables available to the Python code.

Update: Before part 2, there’s a part 1A that doesn’t improve our python!{} macro yet, but goes into some details people have asked me about. Specifically, it goes into:

Why you’d want to use Python inside Rust like this,

Syntax issues like using Python’s single-quoted strings, and

The option of using Span::source_text, which didn’t exist when I first wrote this code.

Next: Part 1A

Contents

Sneak preview

Running Python code

Rule based macros

White-space and tokens

Procedural macros

Reconstructing white-space

Turning this into a library

What’s next