Writing Python inside your Rust code — Part 1
Contents
About a year ago, I published a Rust crate called inline-python,
which allows you to easily mix some Python into your Rust code using a python!{ .. }
macro.
In this series, I’ll go through the process of developing this crate from scratch.
Sneak preview
If you’re not familiar with the inline-python crate, this is what it allows you to do:
fn main() {
let who = "world";
let n = 5;
python! {
for i in range('n):
print(i, "Hello", 'who)
print("Goodbye")
}
}
It allows you to embed Python code right between your lines of Rust code. It even allows you to use your Rust variables inside the Python code.
We’ll start with a much simpler case, and slowly work our way up to this result (and more!).
Running Python code
First, let’s take a look at how we can run Python code from Rust. Let’s try to make this first simple example work:
fn main() {
println!("Hello ...");
run_python("print(\"... World!\")");
}
We could implement run_python
by using std::process::Command
to run the python
executable and pass it the Python code,
but if we ever expect to be able to define and read back Python variables,
we’re probably better off if we start by using the PyO3 library instead.
PyO3 gives us Rust bindings for Python. It nicely wraps the Python C API, letting us interact with all kind of Python objects directly from Rust. (And even make Python libraries in Rust, but that’s a whole other topic.)
Its Python::run
function looks exactly like what we need. It takes the Python code as a &str
,
and allows us to define any variables in scope using two optional PyDict
s.
Let’s give it a try:
fn run_python(code: &str) {
let py = pyo3::Python::acquire_gil(); // Acquire the 'global interpreter lock', as Python is not thread-safe.
py.python().run(code, None, None).unwrap(); // No locals, no globals.
}
$ cargo run
Compiling scratchpad v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.29s
Running `target/debug/scratchpad`
Hello ...
... World!
Success!
Rule based macros
Writing inside a string literal is not the most convenient way to write Python, so let’s see if we can improve that. Macros allow us to define custom syntax within Rust, so let’s try to use one:
fn main() {
println!("Hello ...");
python! {
print("... World!")
}
}
Macros are normally defined using using macro_rules!
,
which lets you define a macro using advanced ‘find and replace’ rules based on things like tokens and expressions.
(See the chapter on macros in the Rust Book
for an introduction to macro_rules!
.
See The Little Book of Rust Macros
for all the scary details.)
Macros defined by macro_rules!
can not execute any code at compile time, they are only applying
replacement rules based on patterns.
Great for things like vec![]
, and even lazy_static!{ .. }
,
but not powerful enough for things such as parsing and compiling regular expressions (e.g. regex!("a.*b")
).
In the matching rules of a macro, we can match on things like expressions, identifiers, types, and many other things. Since ‘valid Python code’ is not an option, we’ll just make our macro accept anything: raw tokens, as many as needed:
macro_rules! python {
($($code:tt)*) => {
...
}
}
(See the resources linked above for details on how macro_rules!
works.)
An invocation of our macro should result in run_python("..")
,
with all Python code wrapped in that string literal.
We’re luckily: there’s a builtin macro that puts things in a string for us,
called stringify!
.
macro_rules! python {
($($code:tt)*) => {
run_python(stringify!($($code)*));
}
}
Let’s try!
$ cargo r
Compiling scratchpad v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.32s
Running `target/debug/scratchpad`
Hello ...
... World!
Success!
But wait, what happens if we have more than one line of Python code?
fn main() {
println!("Hello ...");
python! {
print("... World!")
print("Bye.")
}
}
$ cargo r
Compiling scratchpad v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.31s
Running `target/debug/scratchpad`
Hello ...
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { type: Py(0x7f1c0a5649a0, PhantomData) }', src/main.rs:9:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Oof, that’s unfortunate.
To debug this, let’s properly print the PyErr
, and also show the exact Python code we’re feeding to Python::run
:
fn run_python(code: &str) {
println!("-----");
println!("{}", code);
println!("-----");
let py = pyo3::Python::acquire_gil();
if let Err(e) = py.python().run(code, None, None) {
e.print(py.python());
}
}
$ cargo r
Compiling scratchpad v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.27s
Running `target/debug/scratchpad`
Hello ...
-----
print("... World!") print("Bye.")
-----
File "<string>", line 1
print("... World!") print("Bye.")
^
SyntaxError: invalid syntax
Apparently both lines of Python code ended up on the same line, and the Python rightfully complains about this being invalid syntax.
And now we’ve stumbled across the biggest problem we’ll have to overcome:
stringify!
messes up white-space.
White-space and tokens
Let’s take a closer look at what stringify!
does:
fn main() {
println!("{}", stringify!(
a 123 b c
x ( y + z )
// comment
...
));
}
$ cargo r
Compiling scratchpad v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.21s
Running `target/debug/scratchpad`
a 123 b c x(y + z) ...
Not only does it remove all unnecessary white-space, it even removes comments.
The reason is that we’re working with tokens here, not the original source code:
a
, 123
, b
, etc.
One of the first things rustc does, is to tokenize the source code.
This makes it easier to do the rest of the parsing,
not having to deal with individual characters like 1
, 2
, 3
,
but only with tokens such as ‘integer literal 123’.
Also, white-space and comments are gone after tokenizing,
as they are meaningless for the compiler.
stringify!()
is a way to convert a bunch of tokens back to a string,
but on a ‘best effort’ basis: It will convert the tokens back to text,
and only insert spaces around tokens when needed
(to avoid turning b
, c
into bc
).
So this is a bit of a dead end. Rustc has carelessly thrown our precious white-space away, which is very significant in Python.
We could try to have some code guess which spaces have to be replaced back by newlines, but indentation is definitely going to be a problem:
fn main() {
let a = stringify!(
if False:
x()
y()
);
let b = stringify!(
if False:
x()
y()
);
dbg!(a);
dbg!(b);
dbg!(a == b);
}
$ cargo r
Compiling scratchpad v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.20s
Running `target/debug/scratchpad`
[src/main.rs:12] a = "if False : x() y()"
[src/main.rs:13] b = "if False : x() y()"
[src/main.rs:14] a == b = true
The two snippets of Python code have a different meaning, but stringify!
gives us the same result for both.
Before giving up, let’s try the other type of macros.
Procedural macros
Rust’s procedural macros
are another way to define macros.
Whereas macro_rules!
can only define ‘function-style macros’ (those with an !
),
procedural macros can also define custom derive macros (e.g. #[derive(Stuff)]
)
and attribute macros (e.g. #[stuff]
).
Procedural macros are implemented as a compiler plugin. You get to write a function that gets access to the token stream the compiler sees, can do whatever it wants, and then needs to return a new token stream which the compiler will use instead (or in addition, in the case of a custom derive):
#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
todo!()
}
That TokenStream
there doesn’t predict anything good.
We need the original source code, not just the tokens.
But let’s just continue anyway.
Maybe a procedural macro gives us more flexibility to hack our way around any problems.
Because procedural macros run Rust code as part of the compilation process,
they need to go in a separate proc-macro
crate,
which is compiled before you can compile anything that uses it.
$ cargo new --lib python-macro
Created library `python-macro` package
In python-macro/Cargo.toml
:
[lib]
proc-macro = true
In Cargo.toml
:
[dependencies]
python-macro = { path = "./python-macro" }
Let’s start with an implementation that just panics (todo!()
),
after printing the TokenStream
:
// python-macro/src/lib.rs
extern crate proc_macro;
use proc_macro::TokenStream;
#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
dbg!(input.to_string());
todo!()
}
// src/main.rs
use python_macro::python;
fn main() {
println!("Hello ...");
python! {
print("... World!")
print("Bye.")
}
}
$ cargo r
Compiling python-macro v0.1.0
Compiling scratchpad v0.1.0
error[E0658]: procedural macros cannot be expanded to statements
--> src/main.rs:5:5
|
5 | / python! {
6 | | print("... World!")
7 | | print("Bye.")
8 | | }
| |_____^
|
= note: see issue #54727 <https://github.com/rust-lang/rust/issues/54727> for more information
= help: add `#![feature(proc_macro_hygiene)]` to the crate attributes to enable
Whelp, what happened here?
Rust complains that ‘procedural macros cannot be expanded to statements’, and something about enabling ‘hygienic macros’.
Macro hygiene is the wonderful feature of Rust macros to not accidentally ’leak’ any names to the outside world (or the reverse).
If a macro expands to code that uses some temporary variable named x
, it will be separate from any x
that appears in any code outside of the macro.
However, this feature isn’t stable yet for procedural macros. The result is that procedural macros are not (yet) allowed to appear in any place other than as a item by itself (e.g. at file scope, but not inside a function).
There exists a very horrible fascinating workaround for this,
but let’s just enable the experimental #![feature(proc_macro_hygiene)]
and continue our adventure.
(If you are reading this in the future, when proc_macro_hygiene
has been stabilized: You could’ve skipped the last few paragraphs. ^^)
$ sed -i '1i#![feature(proc_macro_hygiene)]' src/main.rs
$ cargo r
Compiling scratchpad v0.1.0
[python-macro/src/lib.rs:6] input.to_string() = "print(\"... World!\") print(\"Bye.\")"
error: proc macro panicked
--> src/main.rs:6:5
|
6 | / python! {
7 | | print("... World!")
8 | | print("Bye.")
9 | | }
| |_____^
|
= help: message: not yet implemented
error: aborting due to previous error
error: could not compile `scratchpad`.
Our procedural macro panics as expected, after showing us the input it got as string:
print("... World!") print("Bye.")
Again, as expected, with the white-space thrown away. :(
Time to give up.
Or.. Maybe there’s a way to work around this.
Reconstructing white-space
Although rustc
only works with tokens while parsing en compiling,
it somehow still knows exactly where to point when it has errors to report.
There’s no newlines left in the tokens, but it still knows our error happened on lines 6 through 9. How?
It turns out that tokens contain quite a bit of information. They contain a Span
, which
is basically the start and end location of the token in the original source file.
The Span
can tell which file, line, and column number a token starts and ends at.
If we can get to this information, we can reconstruct the white-space by putting spaces and newlines between tokens to match their line and column information.
Functions that give us this information are not yet stable and
gated behind #![feature(proc_macro_span)]
.
Let’s enable it, and see what we get:
#![feature(proc_macro_span)]
extern crate proc_macro;
use proc_macro::TokenStream;
#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
for t in input {
dbg!(t.span().start());
}
todo!()
}
$ cargo r
Compiling python-macro v0.1.0
Compiling scratchpad v0.1.0
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
line: 7,
column: 8,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
line: 7,
column: 13,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
line: 8,
column: 8,
}
[python-macro/src/lib.rs:9] t.span().start() = LineColumn {
line: 8,
column: 13,
}
Nice! We got some numbers.
But there’s only four tokens?
It turns out ("... World!")
appears one token here, and not three ((
, "... World!"
, and )
).
If we look at the documentation of TokenStream
,
we can see it doesn’t give us a stream of tokens, but of token trees.
Apparently Rust’s tokenizer already matches parentheses (and braces and brackets)
and doesn’t just give a linear list of tokens, but a tree of tokens.
Tokens inside parentheses will be children of a single Group
token.
Let’s modify our procedural macro to recursively go over all the tokens inside groups as well (and improve the output a bit):
#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
print(input);
todo!()
}
fn print(input: TokenStream) {
for t in input {
if let TokenTree::Group(g) = t {
println!("{:?}: open {:?}", g.span_open().start(), g.delimiter());
print(g.stream());
println!("{:?}: close {:?}", g.span_close().start(), g.delimiter());
} else {
println!("{:?}: {}", t.span().start(), t.to_string());
}
}
}
$ cargo r
Compiling python-macro v0.1.0
Compiling scratchpad v0.1.0
LineColumn { line: 7, column: 8 }: print
LineColumn { line: 7, column: 13 }: open Parenthesis
LineColumn { line: 7, column: 14 }: "... World!"
LineColumn { line: 7, column: 26 }: close Parenthesis
LineColumn { line: 8, column: 8 }: print
LineColumn { line: 8, column: 13 }: open Parenthesis
LineColumn { line: 8, column: 14 }: "Bye."
LineColumn { line: 8, column: 20 }: close Parenthesis
Wonderful!
Now to reconstruct the white-space, we need to insert newlines if we’re not on the right line yet, and spaces if we’re not in the right column yet. Let’s see:
#![feature(proc_macro_span)]
extern crate proc_macro;
use proc_macro::{TokenTree, TokenStream, LineColumn};
#[proc_macro]
pub fn python(input: TokenStream) -> TokenStream {
let mut s = Source {
source: String::new(),
line: 1,
col: 0,
};
s.reconstruct_from(input);
println!("{}", s.source);
todo!()
}
struct Source {
source: String,
line: usize,
col: usize,
}
impl Source {
fn reconstruct_from(&mut self, input: TokenStream) {
for t in input {
if let TokenTree::Group(g) = t {
let s = g.to_string();
self.add_whitespace(g.span_open().start());
self.add_str(&s[..1]); // the '[', '{' or '('.
self.reconstruct_from(g.stream());
self.add_whitespace(g.span_close().start());
self.add_str(&s[s.len() - 1..]); // the ']', '}' or ')'.
} else {
self.add_whitespace(t.span().start());
self.add_str(&t.to_string());
}
}
}
fn add_str(&mut self, s: &str) {
// Let's assume for now s contains no newlines.
self.source += s;
self.col += s.len();
}
fn add_whitespace(&mut self, loc: LineColumn) {
while self.line < loc.line {
self.source.push('\n');
self.line += 1;
self.col = 0;
}
while self.col < loc.column {
self.source.push(' ');
self.col += 1;
}
}
}
Fingers crossed..
$ cargo r
Compiling python-macro v0.1.0
Compiling scratchpad v0.1.0
print("... World!")
print("Bye.")
error: proc macro panicked
Okay, that works, but what’s with all the extra newlines and spaces?
Oh right, the first token starts at line 7 column 8, so it correctly puts print
on line 7 in column 8.
The location we’re looking at is the exact location in the .rs
file.
The extra newlines at the start are not a problem (empty lines have no effect in Python).
It even has a nice side effect: When Python reports an error, the line number it reports
will match the line number in the .rs
file.
However, the 8 spaces are a problem.
Although the Python code inside our python!{..}
is properly indented with respect to our Rust code,
the Python code we extract should start at a ‘zero’ indentation level.
Otherwise Python will complain about invalid indentation.
Let’s subtract the column number of the first token from all column numbers:
start_col: None,
// <snip>
start_col: Option<usize>,
// <snip>
let start_col = *self.start_col.get_or_insert(loc.column);
let col = loc.column.checked_sub(start_col).expect("Invalid indentation.");
while self.col < col {
self.source.push(' ');
self.col += 1;
}
// <snip>
$ cargo r
Compiling python-macro v0.1.0
Compiling scratchpad v0.1.0
print("... World!")
print("Bye.")
error: proc macro panicked
Awesome!
Now we only have to turn this string into a string literal token
and put run_python();
around it:
TokenStream::from_iter(vec![
TokenTree::from(Ident::new("run_python", Span::call_site())),
TokenTree::Group(Group::new(
Delimiter::Parenthesis,
TokenStream::from(TokenTree::from(Literal::string(&s.source))),
)),
TokenTree::from(Punct::new(';', Spacing::Alone)),
])
Ugh, working with token trees is horrible. Especially making trees and streams from scratch.
If only there was a way to just write the Rust code we want to produce and—
Ah yes, the quote!
macro from the quote
crate:
let source = s.source;
quote!( run_python(#source); ).into()
Okay, that’s better.
Now to test it using our original run_python
function:
#![feature(proc_macro_hygiene)]
use python_macro::python;
fn run_python(code: &str) {
let py = pyo3::Python::acquire_gil();
if let Err(e) = py.python().run(code, None, None) {
e.print(py.python());
}
}
fn main() {
println!("Hello ...");
python! {
print("... World!")
print("Bye.")
}
}
$ cargo r
Compiling scratchpad v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.31s
Running `target/debug/scratchpad`
Hello ...
... World!
Bye.
Success!
🎉
Turning this into a library
Now to turn this into a reusable library, we:
- Remove
fn main
, - Rename
main.rs
tolib.rs
, - Give the crate a good name, like
inline-python
, - Make
run_python
public, - Change the
run_python
call in thequote!()
toinline_python::run_python
, and - Add
pub python_macro::python;
to re-export thepython!
macro from this crate.
What’s next
There’s probably tons of things to improve and plenty of bugs to discover, but at least we can now run snippets of Python in between our lines of Rust code.
The biggest problem for now is that this isn’t very useful yet, since no data can (easily) cross the Rust-Python border.
In part 2, we’ll take a look at how we can make Rust variables available to the Python code.
Update: Before part 2, there’s a part 1A that doesn’t improve our
python!{}
macro yet, but goes into some details people have asked me about. Specifically, it goes into:
- Why you’d want to use Python inside Rust like this,
- Syntax issues like using Python’s single-quoted strings, and
- The option of using
Span::source_text
, which didn’t exist when I first wrote this code.
Next: Part 1A