uv with Charlie Marsh
About improving the Python ecosystem with Rust
2025-05-15 75 min
Description & Show Notes
Up until a few years ago, Python tooling was a nightmare: basic tasks like installing packages or managing Python versions was a pain. The tools were brittle and did not work well together, mired in a swamp of underspecified implementation defined behaviour.
Then, apparently suddenly, but in reality backed by years of ongoing work on formal interoperability specifications, we saw a renaissance of new ideas in the Python ecosystem. It started with Poetry and pipx and continued with tooling written in Rust like rye, which later got incorporated into Astral.
Astral in particular contributed a very important piece to the puzzle: uv – an extremely fast Python package and project manager that supersedes all previous attempts; For example, it is 10x-100x faster than pip.
In this episode I talk to Charlie Marsh, the Founder and CEO of Astral. We talk about Astral’s mission and how Rust plays an important role in it.
Up until a few years ago, Python tooling was a nightmare: basic tasks like installing packages or managing Python versions was a pain. The tools were brittle and did not work well together.
Then, suddenly, we saw a renaissance of new ideas in the Python ecosystem. It started with Poetry and pipx and continued with tooling written in Rust like rye, which later got incorporated into Astral.
Astral in particular contributed a very important piece to the puzzle: uv -- an extremely fast Python package and project manager that supersedes all previous attempts; For example, it is 10x-100x faster than pip.
In this episode I talk to Charlie Marsh, the Founder and CEO of Astral. We talk about Astral's mission and how Rust plays an important role in it.
About Astral
Astral is a company that builds tools for Python developers. What sounds simple is actually a very complex problem: Python's ecosystem is huge, but fragmented and often incompatible. Astral’s mission is to make the Python ecosystem more productive by building high-performance developer tools, starting with Ruff. In their words: "Fast, unified, futuristic."
About Charlie Marsh
Charlie is a long-time open source developer and entrepreneur. He has an impressive CV, graduating with highest honors from Princeton University. After that, he worked at Khan Academy and others before eventually founding Astral in '22. Charlie is an engaging speaker and a great communicator.
Proudly Supported by CodeCrafters
Proudly Supported by CodeCrafters
CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch.
Start for free today and enjoy 40% off any paid plan by using this link.
Links From The Episode
- ruff - Python static linter and formatter written in Rust
- uv - Python package and project manager written in Rust
- rustfmt - Rust code formatter
- clippy - Linter for Rust code
- The Rust Programming Language: Cargo Workspaces - The Rust Book's chapter on workspaces
- pip - Package Installer for Python
- pip documentation: Requirements File Format - A description of the format of requirements.txt, including a list of embedded CLI options
- uv's CI - Build scripts for many different platforms
- jemalloc - Alternative memory allocator
- zlib-ng - Next Generation zlib implementation in C
- reqwest - An easy and powerful Rust HTTP Client
- zlib-rs - Pure Rust implementation of zlib
- XCode Instruments - Native macOS performance profiler
- CodSpeed - Continuous benchmarking in CI
- hyperfine - "macro benchmarking" tool, coincidentally written in Rust
- samply - Sampling based profiler written in Rust
- cargo flamegraph - Cargo profiling plugin
- tokio - Asynchronous runtime for Rust
- curl-rust - Network API used in cargo
- tar-rs - Sync tar crate
- async-tar - Async tar crate based on the async_std runtime
- tokio-tar - Async tar crate based on tokio
- astral-tokio-tar - Async tar crate based on tokio, maintained by Astral
- RustPython - Python interpreter written in Rust
- lalrpop - The parser generator used by RustPython
- Charlie's EuroRust 2024 Talk - Mentions the version number parser at 18:45
- ripgrep - Andrew Gallant's idiomatic Rust project, which also happens to be a very fast CLI file search tool
Official Links
Transcript
It's time for another episode of Rust in Production, a podcast about companies
who use Rust to shape the future of infrastructure.
I'm Matthias Endler from corrode and today's guest is Charlie Marsh from Astral.
We talk about improving the Python ecosystem with Rust.
Charlie, thanks for being a guest. Can you say a few words about yourself and
about Astral, the company you work for?
Yeah, of course. Thanks so much for having me. first of all.
My name is Charlie Marsh.
I run a company called Astral. We build high-performance developer tools for the Python ecosystem.
So we're best known for two tools, RUF, which is a sort of combined linter formatter
code transformation tool.
You could think of it like kind of Rust format and Clippy together,
which is again for Python code, but written in Rust.
And then UV, which is our project manager, Python package manager,
Python tool chain manager. It's, it's, it's, you could think of it a little
bit like cargo and Rust up.
So it tries to bootstrap or will bootstrap Python for you and then help you
manage your dependencies, install things, lock them into lock files and reproducible
versions and all that kind of stuff.
So yeah, everything we've built so far is open source and written in Rust.
We're a team of about 15 people spread from the Pacific, like Pacific time in the U.S.
We have people in pacific time central time eastern time
and we have one person in the uk we have four people in cet or like germany
switzerland the netherlands and then one person in india so not only are we
like remote we're like very distributed kind of like a lot of open source and
we spend all our time basically writing rust to try and make python better i.
Heard about astral the first time when you published ruff but it took off with uv i would say like.
There was.
Definitely a huge sympathy for what you do in the packaging space as well did
you see that as well from a company perspective that there was a huge growth.
I think so. Yeah. I mean, I think, you know, we started with Ruff.
I started working on Ruff before the company existed.
And it was just something I was building.
Well, for a lot of reasons, but largely because I saw these were problems that
I had experienced in my own projects.
And I was like, what would it be like if I wrote, you know, Python tooling and Rust instead?
And that was sort of the genesis of Ruff. and ruff
grew extremely quickly like we had you know
decades old projects like scipy and
stuff like adopting it while it was still like what i would consider to be very
unstable so it was clear that people wanted you know something in this in this
arena of faster python tooling so ruff grew very fast and you know we were
we were seeing what was happening with ruff and we were like well.
As a company and in terms of like the problems we're trying to
solve like we're not just we're not trying to
build just that like we want to build a python
tool chain effectively like we want to solve the hard
problems in python and for me like packaging was
kind of like if you want to be like the python tooling
company which aspirational i guess we'd like
to be then like you have to work on packaging i think
because it's it's the thing that everyone has
trouble with and the thing that has been so many people have tried to
solve and and done good work on but but i
don't think anyone would have would consider it solved and so
you know i i kind of knew from early on that we wanted to do something in packaging
and before we released it i definitely felt nervous in that ruff had grown
well and people really liked it and it was it was kind of felt like it was going
to be a tough act to follow like we don't want to be a one-hit wonder you know
with the tools that we built.
Like I wanted to build something that hopefully would be as exciting and grow
as well and have as much of an impact as Ruff.
And I think UV has actually really surpassed that in a lot of ways.
It's like, I think the impact it's had on Python has probably been more significant.
And I think like the, not that I don't love Ruff.
I mean, it was my first big project and we still like half the people on the
team are still working, are working on Ruff and the static analysis tooling.
It's like a big focus area for us but i think i've been really amazed by how
quickly uv has had the impact that it's had like we released it in february
of last year so it's just it just is one turned one year old and in that time it's grown to like.
I think last I checked, it's like a little over 12.5% of all requests to the
Python index come from UV, which is like over 200 million requests a day, which is wild, right?
And I spend a lot of my time talking to companies. There are tons of like 1
billion, 10 billion, even $100 billion companies that are using this thing in
production and have been for a long time.
So just like the way it grew i think really surpassed
my my expectations and i feel
like we now have a a platform to like keep hopefully making python better but
but i think you're right that like uv i think we shifted into a little bit of
another gear we spread beyond just being like the company was just ruff before
and now we are like python tooling and we try to solve like a much wider surface area of problems now.
You said that ruff was your first major rust project
maybe the first actual project that you tried after
like learning rust i'm not sure about this but it certainly
was big milestone for you in your rust journey but then
you compared it with uv and did you already learn a few things that you wanted
to avoid with uv that you did with ruff and also did you set up the uv code
base early on for such growth.
Yeah so So I would say I really learned Rust in the process of writing Rust.
I had done some Rust at my previous job, but someone else on the team introduced
Rust, really great engineer.
And when I was contributing to that code base, I was mostly trying to get in
and out as quickly as possible because I didn't really know Rust.
I was like, I need to fix a bug. How do I get this to compile?
I wasn't really taking the time to learn it. I was just using it when I had
to. So I had some Rust exposure.
But part of why I worked on ruff in the first place was I felt like I wanted
to really learn rust. And I thought that in order to do that,
I had to build something from scratch. And I had to like, it's just kind of
the way that I like to learn. I was like, I need to build something.
And like, even if that requires like, wasting a ton of time trying to understand
like lifetimes, like maybe I'll burn like two days trying to get this to compile.
But like, that's kind of how I learn is just like, I have to build things and
like fight through the problems. And I wanted to learn the ecosystem and just
the tooling and everything.
So I did really learn Rust in the process of writing ruff.
I think that showed, like there was probably still shows a little bit today,
like, you know, especially early on, there were a lot of things I was doing
that were just that I would now, I mean, this is, this is part of personal growth, right?
There's lots of things I was doing that I would now look at and laugh about
and say, that's obviously not the right way to do it.
Whether it was like silly performance things or the way I was structuring the
code or or I don't know, anything.
By the time that we started working on UV, I think I felt a lot more comfortable in Rust.
And I'd also learned more from looking at other projects too,
I think. And we had kind of evolved Ruff a little bit over time.
So for example, like we have a really, a fairly wide like crate structure.
Like we just create a lot of crates.
Like if you go and open up Ruff or UV, the crate subdirectory,
there's like in UV, there's probably.
There's at least 20 crates, maybe like 30. So we just create like lots of crates.
And that was something that I started to do in ruff because our,
our build and compile times got really bad.
And it became like, if I just put everything in one crate, it was like our build
and compile times got worse and worse. And the development loop was painful.
And, you know, at the time I was reading a
lot about how sort of like crates are like the atomic unit for like
Rust-C and like you can parallelize across crates and you need
to be thinking about like how your crate graph looks so
that there's a lot of like fan out and stuff and i was like okay we're going
to start creating like lots more smaller crates and we're going to sort of change
the atomic unit of like what the code base is from like the module to the crate
and so we started carving out like lots of little crates and we did basically
the same thing and for example in ruff like you know.
The core linter crate like doesn't
depend on clap for example like it doesn't have it doesn't depend on
any of the cli stuff the cli stuff is in a separate crate
that depends on the linter crate and you just try and get this structure in this
organization or like the parser and the ast those are
all their own crates so we can if you just need to test the
parser right it's really fast to compile and build or if people
if other people even just need to pull in the parser as a library which
some people do it's much easier so by the time we got
to uv i think we had ironed out a
lot of things that made it so we put we just
put a lot of stuff in place from the start like we just had a better separation
in terms of what went into what crates i think
we'd ironed out a lot of our workflows for example like what's our clippy configuration
how do we do how do we pin the rust toolchain version like how do we install
rust in ci like there were just all these little things that we were able to
just copy over from ruff and i think that made things a lot easier.
At the same time, there were still a lot of design decisions in UV architecturally
that were pretty different because the design of a package manager and the design
of the linter are just the things they need to do are very different.
So, for example, UV has a whole networking stack, right?
It needs to make lots of HTTP requests and the linter doesn't need to do any of that.
So suddenly we had to think about requests or basically the whole HTTP stack.
We had to think about like open ssl we had to think about all these system dependencies
like git that was another two like we support git dependencies so suddenly you
have to think about how you depend on git i think there was just a lot of complexity
that came with building a package manager that didn't we didn't necessarily
have to encounter when we were working on the linter,
which is good because i kind of like encountered those problems over time when.
I understand you correctly to summarize you didn't have a rough start with uv because.
You could copy over.
A lot of the let's say templates or a lot of the best practices from the previous project.
At least what we thought were best practices yeah and i guess to some degree
we still think our best practices but when.
You used uv when you started with uv did you already compartmentalize some of
the functionality into smaller crates somewhat subconsciously did you say oh
yeah this definitely goes into a separate crate now or did you still start with a single large crate.
No i think we were like compartmentalizing stuff basically from the start and
for example like you know we had one crate for like parsing python versions
there's a whole spec around versioning that requires,
a lot of there's a lot of intricacies to it so that's its own crate we had one crate for like,
parsing requirements txt files that's another thing that's common in
python that's there's actually no spec for that that's like implementation defined
and so we have like one crate for that you know we have a crate for
creating virtual environments we had a crate for the cli we
had a crate for the resolver so yeah we broke things down i think pretty early
on and that i mean that's generally served us very well so i'm a big fan of
that structure i think it's like much the trade-off is really different when
you're publishing a library like we don't publish any of our we don't,
we don't publish any of anything in rougher uv as
a library like we don't publish to crates io any of
the stuff that we build right now like the public api for our stuff tends to
be the command line so the cli the command line interface so like if we had
to publish things i think we would have to think a lot more carefully well there's
all sorts of considerations that come with publishing that you don't have to
think about if you aren't publishing.
But one is, if we had a really granular create structure, I think it becomes
a bit harder for people to use your... There's a lot more of a maintenance burden,
and it's also harder for people to use and compose your things.
But you don't go and...
Introduce a crate for types specifically because that's one thing that i see some,
some people do some companies do that they have
a types crate where they put everything that is related to
you know their basic types into one thing
and then at the end they don't
really get a lot of benefits from using a workspace because you need that types
crate anyway in all of your other crates and so it kind of goes against to what
a workspace is about in my opinion but i want to hear it from you like what
are some anti-patterns for building workspaces.
Yeah so like i guess
yeah we try not to do things like like
that i don't know what we would call that pattern but like
create a crate that's just kind of like used everywhere there's
maybe some of that but we try to generally avoid it
i think the other anti-patterns are like you can
make your crate structure like too granular like if
you find yourself creating a crate for that's like
one function and sometimes
you'll find yourself doing that because you have like circular dependencies
or something in the dependency chain that's requiring you to do this
like i have two crates that need this functionality and they can't depend on
one can't depend on the other or something and then you find yourself putting
a function into like like it's just a crate that's like a function at that point
you've probably gone too granular and you kind of need to rethink like the organization
because there is some overhead to having all these crates and like,
you know it's not always like super fun to maintain i think
a couple things that we do though that are kind of nice for this
is like we prefix all the crates
in the workspace with uv or with ruff so like it's easy to differentiate what's
us create in the workspace versus what's a third party crate like all of our
crates are like uv virtual env or like uv resolver or like whatever else that's
pretty nice and then the other thing is we.
We declare them all in the workspace root so that every crate that depends on
other crates in the workspace can just use workspace equals true.
This is a little bit nuanced and maybe hard to visualize, but it basically means
that in the cargo.toml for all the crates that are in the workspace,
it's very obvious what else is in the workspace.
Another thing that we do is we actually try
to use workspace equals true for all basically
all dependencies in the workspace so we
put everything that we depend on in the root cargo toml
like the workspace root and then we use workspace equals true
everywhere and that tends to simplify
things a lot it basically means we have like one dependency specifier for
request for example and a
common set of like default extras or no default extras or
sorry not extras features extras is the python version is
the python analog to to features but like no default features
or you know or whatever else we have like one dependency declaration for
everything that we need and then all the workspace members their cargo terminals
are just very straightforward it's like the things they require with workspace
equals true so we don't have to think about oh do we have request dependency
specifiers across like 10 or 15 different crates we just have one definition
for it so like that's also i think been a really handy thing right.
Workspaces seem to be one of these things that i keep missing in other ecosystems
and i can't remember if there's even another language that has such a feature from the top.
Of my head yeah think about one i mean we've thought about
it a fair amount in python so like uv
has a has workspace functionality that's like very
much modeled after cargo um so
you have a root it defines the members
we use very similar actually like members and excludes syntax
and there's nice things like that you know
we you can do like uv run dash p just like you can do
cargo run right so like we've we've just like
copied a lot of things from cargo because it's great
and we're like we want to have those things in python
and for workspaces i think that's
worked really well it's been very interesting to introduce them because for
a lot of people it's fairly new concept like when they come to python right
like like most people who use uv i've never used Rust right they've never
heard of a cargo workspace before and like why should they
you know if they're just working in python so it's been interesting to try and
like i communicate that and like help people understand like what it's for and
like why you might use it and like what what an example workspace might look
like in practice there are some things that we miss that are hard to get a little
bit hard to get like without standards like for example.
We don't support, I just spent all this time talking about this incredibly boring
workspace equals true thing that all your listeners are probably like,
why is he so much time talking about that?
But like, we can't really do that in UV because like there isn't really a way
to express that in the standards.
And so, you know, there are some things that we have to kind of either can't
support or have to get creative about how we support.
But I think the workspace concept is excellent. And like, I'm really glad that
we made it such a first class thing in UV.
Now, I'm pretty sure that we could
talk about workspaces for hours because there's a lot of nuance to it.
And I think a lot of people that haven't tried it themselves,
they just don't know what the fuss is all about.
But I do believe that there's more to it than just the term.
Now, one other thing that you mentioned, though, which is very close to my heart
is parsing stuff, especially the requirements txt, which is unspecified, I heard.
Now, isn't it super simple?
Yeah isn't.
It super simple like someone might hear this and say oh yeah i just opened my
editor i string split every line on the equal sign and that's my requirements
txt parser why is that not the case.
Yeah so for requirements txt it's like
this is basically this is a file format that
exists for pip really pip is
like the i guess sort of how should
i describe it like the reference implementation for a lot of things
it's really been like the python package installer for a long
time yeah um and requirements txt
is a file format that exists for pip and you know
kind of the way that you can think about it which
maybe people don't think about it this way is like it's
basically like each line is a command line argument because you
can not only put requirements in there you can also put command line arguments
and settings yeah which is interesting so like on pip and pip you can pass an
index url on the command line which is like what registry should i use for fetching
packages you can actually put that in requirements txt too you can just do dash dash index url.
You can also like embed like you can also nest them so like within a requirements
txt you can do dash r and point to a different requirements txt and then it's
sort of like it gets inlined roughly so there's a lot of complexity to the stuff
that's in requirements txt and that people don't really think about.
And there are also lots of very subtle behaviors where, especially for us, we often have to decide,
well, what do we want to do? Do we want to be like bug for bug compatible with
pip or do we want to do things like slightly different?
And a lot of these are edge cases, but there's just a lot of nuance to it.
Like in a requirements.txt file, you can have, I guess what I would call like a named dependency.
Like you could say like flask and then the version that you want,
or you could do flask at, and then the URL that you should fetch it from or
like a Git repository or something like that.
But you can also just do the URL or just do the Git repository.
And it turns out that in PIP, there's slightly different behaviors around how
those things are parsed, like the URL, if it's just the URL versus if it's after a name dependency.
And how white space is handled, like error recovery, all this stuff is just a little bit different.
And so over time, we've had to add, basically, the only way to know if we're
doing the right thing is we just see what PIP does. and then we try to mimic that to some degree.
Or if we think we can do things that are unambiguously clearer, then we'll do that.
But yeah, that one's especially hard. It's a little bit easier for other things
like Python version specifiers where there's a clear standard on how these things should be parsed.
And if we see something that's not clear, we can actually ask about it through
the standards process and be like, how should this be handled?
Like blah, blah, blah, blah, blah.
In essence, you make the standards stronger for everyone
which is great what kind of piqued
my interest there was did you have to
go and read the pip source code to be able to understand what's going on because
this is a standard that sort of developed over 30 years i think python is 30
years by now it's from 1994 as far as i remember so and did you really have
to read the source code yes.
Sometimes we definitely have to go read source code,
and which is fine like i don't i don't
really mind that like as long as we can figure out like why a certain behavior exists
in a certain way and like what it's motivated by but yes we often have to go
read the source code to understand like how does this tool handle this case
and like and and also why right like a lot of the time you're trying to understand
why people made certain decisions not just how it works that.
Means you don't even you don't only have to read the source code you also have to read the history the.
Yes version history yes but do.
They have proper tests at least where you can go through and maybe implement
these tests in in your implementation.
Yeah they do have tests yeah it's not always like super
straightforward to take their tests and move them
over to uv like a lot of the times i don't
know just the way that you write tests in like python and rust often tends to
be quite different like or at least the way that we write tests
tends to be quite different like you know like in python like
mocking is very popular for example and like we
don't really have we have like almost no walking in uv like
almost all of our tests i think
are what you would probably consider like integration tests
like the whole way that we test uv is we
actually run real scenarios against
a real package registry and it's almost all based on snapshot testing so like
we snapshot the cli output a lot of the time that's like how we detect if things
are working correctly or not is we snapshot the cli output or we snapshot the
lock file or we snapshot things like that but like the vast majority of the
way that you get tested is.
Integration tests of running real
commands on the cli and snapshotting the output and that's like we have,
a lot of tests and they all run through that.
So it's a little different sometimes because if I go and look,
we do have obviously more traditional unit tests for requirements.txt parser.
That's something that's very testable, right? And we do have tests for that.
But often if it's more complex scenarios around how PIP behaves versus how we
behave, it's a little bit harder to shoo it into what we have.
Do you always run the full test harness?
In CI?
Yeah, or maybe locally as well, but I wonder that's probably too much.
Yeah, we do always run, we almost always run the full test harness.
So like we just, we skip it for changes that are purely documentation.
Which we just detect with some file filters in GitHub Actions,
but otherwise we run the full test suite.
Every change we run it on Linux, macOS, and Windows.
And that's because we do like
that that that's not as critical for ruff but
it's very critical for uv because with the
package manager just like more things tend to differ across
platform and more behaviors differ across platform and
so we always test everything on linux mac os and windows
i guess one interesting piece is like we we
build uv for like a lot of different targets so you
know when we publish to py pi or publish anywhere
or publish to github releases like we're
building like 15 probably somewhere between like
15 and 20 different builds or maybe is it 15 somewhere between 10 and 20 let's
just say different build variants and so that's like linux x86 you know dynamically
linked against glibc linux x86 sorry linux x86 musl linux arch GWC, Linux ArchMuzzle.
We build for some more obscure platforms that are supported in Python,
like PowerPC and S390X and stuff like that.
We build for ARM and x86 Windows, ARM and x86 Mac OS.
So we just have a lot of different builds. And...
I mean setting first of all setting those up to actually
build is fairly complex and so if you ever find yourself needing
to do that you should look at our ci because we've
you know we've just figured out how to build a rust project across all
those different like a lot of those are cross compiled right and
we've just figured out how to like build those across lots of different machines what
is our ci our ci your ci
oh yeah yeah just go look at our.github folder and like
take stuff from it um but the
other piece is we actually like we don't we
rerun all of those builds whenever certain files
change so like if we add a new dependency we rebuild across
all those machines which is fairly because
we've otherwise we've run into situations where we add
a new dependency and then we go to release and then i
don't know the arm musl build fails for some reason that we don't or the windows
arm build fails for something some reason that we don't fully understand and
so now we run those that i guess the only nuance is we we do rebuild everything
on every platform if we for example change our dependencies yeah.
I i think this is kind of where the rust build system also has its limits at
this point in time because especially when you talk about like feature flags
and maybe optional dependencies dev dependencies there's a lot of,
loose ends and sometimes you
can't really specifically and exactly say which dependencies you want to enable
for which platform and you know on top of it you have various system dependencies
and the buildrs files to go with it and sometimes you don't want to have a system
dependency on a certain target and so on.
And it's very hard to work around these limitations.
Yeah. I mean, a lot of the complexity has come from things like accelerators,
sorry, not accelerators, allocators.
Like, you know, like we use, I don't even know how to pronounce these things, jemallo, right?
Like on most platforms, but like it just like doesn't work on like,
it like doesn't compile on like some platform.
So we have to have like a bunch of build configuration around that.
The other one that caused a lot of trouble is.
Was our like zlib implementation like we
wanted to use which is like for decompression like we
wanted to use so like
by default you get this pure rust minis oxide implementation
if you just use like like flate
and like reqwest like if you just use like the reqwest crate to
like decompress stuff you get like this pure rust implementation by default i
think but there's like this uh zlib-ng
version that you can use which is a lot
faster but it requires like it adds
like a CMake dependency and it needs to be built it's like
very hard to build and so we could never get that to build on certain platforms and
so we had to have configuration around that around like where
do we enable it and where do we not we actually totally tore
that out recently and moved over to zlib-rs
like the pure rust implementation of a lot
of these zlib optimizations which in our benchmarking at least
is actually both faster and it's like way easier to build so
when you get something that's pure rust like it's just
like it was it's simplified things dramatically we just like tore out this
like CMake dependency and like all this stuff and
sudden and got rid of all this configuration because now we can just use
like the faster easier to build more portable like rust
version on all platforms so a lot
of it comes from you know if you're trying to do things that are more like customized
or like bleeding edge like that like you use really fast system dependencies
or like allocators like you end up running into these kind of configuration.
Problems but we've we've we've tried to make that simpler over time must.
Have been such a relief to pull out that cmake.
Oh it's amazing it's amazing i actually tried to do it like
anyway sort of like sorry i tried to do it like a few months ago and
then we realized that they were using compile time feature
detection for a lot of the optimizations which isn't
great for us because it meant that like on x86 things
were actually a lot slower because we have to build for
like a common cpu target like and they
recently re-released with like runtime detection and
i was like okay cool we're doing it again so i like i i redid the change like
twice and and like redid all the benchmarking on like my windows machine and
on like on mac os for arm and all that kind of stuff so yeah but like what yeah
when you can get rid of stuff like that it's like immensely helpful i realized whenever i.
Read one of your posts or whenever i see you talk
i realized that you care a
lot about performance and you benchmark
a lot and meticulously and do you have any any tips for people that want to
make their rust code even faster from like doing it so often there must have
been patterns that evolved over time what is a.
Good benchmark.
What's a bad benchmark what are some good libraries out there in general what
are best practices what don't you measure for example that sort.
Of yeah there's like so much to share i mean yeah,
It's like a whole, actually like learning the tooling too around benchmarking is like itself a skill.
And like, we have people like, there are certain things that I only know how
to do on Mac OS and I don't really know how to do on like Linux or windows.
And like, there's other people on my team who would like, would like know how
to do these kinds of, like, I know how to use instruments, for example,
on Mac OS. And that's like its own set of things.
You know, I think for us, there's like a bunch of different forms of benchmarking.
One is like micro benchmarking which is i guess
like generally like the kind of thing that you could do with criterion or
a similar crate where you're like running typically fairly
well isolated code on test inputs
and running like thousands and thousands and thousands or millions
of iterations of it and trying to detect very small changes that's
that is an incredibly useful thing if
you can isolate your code in that way so like if i'm trying
to figure out for example if like there's two different ways
that i could write this function that like parses this very simple string
in the simple way and i'm like i want to know which one's
the faster way then like i'll do something like that i'll be like i'll try
and just isolate like two methods and then use like
criterion or something similar we also do micro
benchmarking on ci so we use a tool called cod speed which does like continuous
benchmarking continuous profiling that again is really good when you have things
that and they run it on valgrind so like basically like if you just try and
do this kind of benchmarking on like a GitHub Actions machine,
it tends to be extremely unreliable.
You have to have like a very high error tolerance. Like when we very early on
in Ruff, when we started adding continuous benchmarking, we did on GitHub Actions
and we were just regularly see like five plus five to 10% fluctuations,
even for no op changes, because those machines are very noisy.
So CodSpeed kind of solves this problem because they run all your stuff under Valgrind.
So they get like a much cleaner, basically a much cleaner snapshot of like what's
actually changing in in terms of performance there's some nuance to it but like
basically if you have code io can make things like really messy with benchmarking
in general but if you have if you can write benchmarks that don't have any io
that are very pure like pure cpu,
you can fit them into that kind of micro benchmarking framework it's like incredibly
useful especially in ruff where we have a lot of that we run i mean we just
run that in ci it catches things all the time real positive changes show up
in you know get flagged as positive changes like it's it's it's a fairly high
signal um so that's like if you can do that that's great,
We can't always do that in UV. We often do more like, I just sort of make these
words up, like macro benchmarking.
I don't know if that's a word, but that's like, I use Hyperfine usually and
I will compile to UV release builds and I will run some operation over and over
and I will try to see if I can detect a difference.
And I'll typically try and minimize IO as much as I can, but maybe it has to
read from disk for the on-disk cache.
And like there, it's like, you're trying to, you know, you're trying to find much smaller.
It either needs to be really obvious or very
consistent in order for you to catch anything this is like
one form of benchmarking is oh it made everything 10
faster then it's like very obvious that it made
things faster and you can catch it in that kind of benchmark the other.
Pieces that you can look at which aren't so much are a
little bit harder to do direct comparisons but we will
do a fair amount of like there's a great tool called sampling
which we use for it's a sample sampling based
profiler so like you can just run a uv command you
basically you just prefix it with sampling and then it opens up
a flame graph in your browser that has
you know all the like all the flame graphs
of everything that got called and where you spent time and so
often that's often a way that it's a
little bit harder you can't always tell
just from a flame graph if you made your whole program faster right
you can tell if you you ran the
flame graph you found something that was taking up a lot of time and then you make
changes and then it's gone like that's that's good
that means like you got rid of that time but like that alone
is not enough to tell that your program got faster right that's
just that you removed that thing that you were spending time
on but maybe it went somewhere else like blah blah so we'll often
do that as a way to diagnose issues and if something's slow then we'll be like
let's run it under sampling and like see where we're spending time for example
and so that's there's like certain tools you want to to diagnose issues and
then certain tools that we often use to try and like confirm findings or understand
if there are regressions or anything like that yeah and.
In my mind when you said sampling profiling and flame graphs i was sort of hoping
that the tool would kind of update the chart as the program runs i'm not sure
if that's the behavior of that program.
I don't think sampling does that i think there might be a way to like compare to
flame graphs and like cod speed tries to do this they'll
try to diff the profiles it works okay sometimes it's hard for them to i mean
it's like inherently that seems like a very hard problem like they have to try
and align like what two function calls are like the same like it's not necessarily
always trivial because code is changing but like there's some stuff like that.
Just for context, maybe someone might not know, a sampling profiler.
How would you describe that in a sentence or two?
I mean, my understanding is it basically watches your program execute and it samples at some rate.
So it takes a bunch of samples as your program is running to try and get a representation
of where you're spending time.
So maybe it takes, just for stupid numbers, maybe it takes a thousand samples.
And like in or let's say 100 samples and
in 10 of those you're in this function then you know it would say hey
i found this we were running this function 10 of the time that i sampled
and here is where it was being called from and like here's how you're spending
time so it basically watches your program execute and tries to figure out based
on probabilistically by sampling the behavior where you're spending your time
that's my understanding i never built one of these so it actually might work
totally differently than that but like but like that's my understanding that's.
Also my understanding.
Which is a little different than like in Valgrind for example like Rust,
I know Rust C I think does this too like you can do profiling based on instruction
counting so like you actually look,
like the instructions that are generated or executed or whatever.
And you, you try and do something that's, you don't look at the behavior of
the program. You don't like run the program.
You try to look at like, or sorry, you run the program, but you look at a different
thing. You're not, you're not focused on time, like wall time.
You're looking at like what instructions are being executed,
how many instructions are being executed.
So it gives you a little bit more of like a quantitative look i guess at like
where you might be spending time and i believe rust c does has continuous benchmarks
that look at i actually i could be totally wrong there's something that i've
been shown before in the rust ecosystem that does continuous benchmarking based
on instructions instruction counting i.
I was kind of thinking that cargo flame graph would do that in comparison to
sampling i never used sampling before but my go-to tool for flame graphs is
always cargo flame graph i'm not sure if you've used that one.
I think i have but it's been a while yeah and you find simply.
To be more ergonomic.
I found it to be easier yeah yeah it's just
like what we tend to use on the team but like you know i'm sure
that there are lots of good options but it's just like
it just works seamlessly across mac os and linux which is really nice so like
a lot of these tools will be like it's maybe hard to get them to work on one
or the other for example not always i don't know if that's true of cargo flame
graph i just mean like of things we've used in the past sometimes it's like
oh this works great on linux but not really on mac os and i think samply tends
to work really well on both cargo.
Flame graph gives you an svg file that you can open in your browser and it's sort of.
Interactive yeah yeah yeah pretty basic as well.
I don't know if there's probably support for the chrome profiler which also
supports flame graphs but i'm not sure.
Yeah yeah the thing so the thing that samply gives you in the browser you can
like click into the flame graph you can filter by name you can filter traces
by name and stuff it's like it's pretty nice but but again like some things
are just really hard to understand in a flame graph like you know if you have a lot of like,
like UV is very async and like we have lots, you know, we have like different
stuff going on. And so sometimes it's hard to tell like where time is actually being spent.
Like it's really things that are spending lots of CPU time are always nice because
they're obvious and you can find them and fix them.
But things where it's like, oh, the scheduling is like slightly off or like
we're waiting here, but like blah, blah, blah.
Like those are tend to be harder, like more pernicious bugs to find. by.
That definition isn't it true that profiling or benchmarking ruff is easier
than uv because ruff is inherently cpu bound you do a lot of computations i'm assuming.
Yeah i think it's i think it's much easier yeah like ruff does
have a fair amount of uh it does have io right in
the sense that we're reading files from disk to analyze
them but that io
is i i would say like it's a
lot more stable and more minimal than
in uv and also it's kind of like sort of
like happens up front like we read the files and then we analyze them
whereas in uv there's kind of just like constantly io
happening whether it's we're reading
stuff from disk from the user's project or
reading stuff from our on disk cache or we're making small
http requests or large http requests we're downloading
and unzipping files like there's just constantly io
happening so it tends to be a bit
harder to benchmark often yeah like
often we'll bench you know often we'll benchmark by
we'll actually run uv with the cache because at
least then it's a lot more stable it's like
we're doing the work but we're like reading from the cache every time as opposed
to if you want to benchmark anything that requires like network io really hard
it's very hard because the amount of variants that you'll get is you will usually
dwarf like anything that you would see in cpu code changes that.
Is true but at the same time you have to be careful not to change your targets you need to be sure.
That you're benchmarking.
The right thing.
Yes absolutely yes and yeah we do make a different distinction between that
we think of it as like warm performance versus cold performance it's like performance
when you have stuff in the cache versus when you don't and we do look at both
There's some things you can do, like you can set up a network link conditioner.
That's what it's called on macOS at least. So you can intentionally throttle
your own network connection to try and get it to be a bit more consistent,
like bring it down to something that would hopefully be a bit more consistent.
But again, that's also different because, well, it measures something,
but it measures something slightly different than what you would get on your machine typically.
Like I have a very high speed internet connection. So like, you know,
the bottlenecks that I experience are different when I throttle versus when
I don't, because when I throttle, the network is slower.
And so if we need to do things at the same time, it's easier.
But when my network connection is really fast, like
actually operations on the CPU can actually become
or like local sorry on disk io can become blocking because
like maybe i'm like streaming it faster really fast
and trying to unzip a file and write it to disk like the bottlenecks you're
just a little bit different the other thing we'll do sometimes is
like if we do need to look at anything related to like
the http stack we'll just like run a local server
so we're still again it's different but it's
gives you some information some particular ability yeah
yeah yeah it's like if you're trying to like at least because at
least then you get consistency in you know
to i mean to to a greater degree at least around like what your what your measurements
look like yeah the really hard thing with network io is just like it can just
be all over the place and if you're trying to measure like a one to five percent
performance change it's very hard to do it in the presence of making lots of network requests and.
In that context i was surprised to learn that you're using a single-threaded tokio runtime.
Isn't that what you're supposed to do when you want super amazing high-performance I.O. in Rust?
Do you always use multiple threads in tokio? Like, why did you decide against that?
Yeah, I mean, I guess like the... Sorry, the simplest answer is just that,
we benchmarked it and we looked at it a lot. And then we found out that like,
basically we use a single tokio thread for IO is sort of the way that I would put it.
And we like, we found at least the theory, I guess, is that it reduces synchronization
costs and that we don't perform like enough IO for multiple threads to be worth it.
So like we're often, you know, what's like a lot of IO, right?
If you think about a web server that's trying to do or
something that's like super high throughput and they're trying to do like thousands
of requests per second that's like pretty different than what we're doing because
like for us maybe we're downloading 20 packages at the same time right that's
20 different requests that are happening it's like pretty different it's a pretty
different amount of throughput so like,
we found that using a single runtime for a sorry a single thread for io and
then being really careful,
about compute work so trying to
make sure that we run compute work like off that main thread or off the main
thread is also really important so for example like we have a solver in uv we
have to solve for dependencies right like we get a big graph of the things that
people depend on and then we look at the things that those packages depend on
and we have to solve this like big.
Constraint satisfaction problem that solver runs
on its own thread so like we we move
that compute off to like a different thread so we have to be like a little
bit careful about like what we do on what on which thread
and like how we orchestrate it but we did we did.
Like actually find in practice that it was faster to just use
like the single threaded runtime there's some
nice other things to it too like we can
use like rc instead of arc in like some places you know
there's like some minor like quality of life things like that but yeah
we just make like a lot of small network requests and
we found that like using the
single threaded runtime empirically was like faster for our
program there was also a long conversation when we started uv
about whether we should use async rust at all which is
sort of another another topic like the thinking
there was like do we actually have enough
io to demand async rust
and or some kind of multi-threaded runtime because like that's maybe what we
would see as like the main benefit is like okay we get all this like thread coordination
and stuff and like i was pretty into it at the time
like other people on the team sort of tried to talk me out of it and were
more like okay why don't we just like manage our threads for
doing this kind of io i felt like
i i
felt like doing it with asyncrust was like would be more like doing
it right and maybe moving slightly more in the direction of like the arc of
the what the ecosystem wants i don't actually know if that's proven to be true
like i'm sort of like i think we could have built uv without using asyncrust
and it would have been find maybe i'll just put it that way at the same time
i actually find working with async rust to be.
And i think it's actually improved fairly dramatically
even since we started the project like since we started the project like
the rust team just like keeps shipping you know like there's like things
just keep getting like async that now is like
async closures async async async
in traits didn't exist when we started the project
we had to use like the async trait macro for
example like there's just a lot of things in async rust
that have actually been like stabilized and improved in the last year and
i i really don't find myself having
to work around async or fight async quote unquote very much
for whatever reason i'm not saying that it's worth it
for every project but like i actually don't think it has been a big
challenge for us i think the only challenge i think the main challenge has been
this stuff around scheduling and like trying to understand like how do we schedule
really efficiently and maybe feeling like we have slightly less control than
if we were hand-rolling our own approach to threading.
I mean, in this situation, there's sort of a middle ground as well,
which is instead of having a global tokio runtime,
let's say by annotating your main function with tokio main, you could also use
structure currency where the core of it is sync.
And when you use a lot of IO, when you need a lot of IO, you can start your
own little runtime even within a function say or within a struct did you consider that approach as well.
Yeah i think that's probably roughly what we would have done if we didn't do
this and again i think that could have been totally fine but this also worked
and so it's like i don't know it's been good i i think i there are challenges
with async right like the things that i mentioned before are becoming less of
a pain, but there's still a lot of a pain,
like async closures, right? Things have to be sent in sync.
It's sort of,
it's slightly infectious, right? Whereby like if one thing, if we need to call
one async function from another function, suddenly like the async propagates upward.
And there were some challenges like that. Like we, just as a random example,
like our Git implementation, I originally basically vendored from Cargo.
Not exactly, but like I looked at Cargo's Git implementation and I was like,
okay, Cargo is good at Git.
How do we deal with Git? and i like i sort of
like started with what they had and then we we changed it pretty significantly
over time but like that was an async right
and and so like and we were like calling into
it from an async you know from async runtime that ended up actually causing
like kind of a lot of problems that were like pretty annoying to debug because
for example they were using they need to make network requests and it was using
i think it was using using curl or something and i was like okay but we use reqwest.
I only want to use one networking stack. So let's replace it with reqwest.
But okay, if we want to replace it with reqwest, and it's going to be sync,
then we have to use reqwest blocking.
And then it was actually for a period of time, I can't remember how we fixed
this, it was actually impossible, because then reqwest blocking actually starts,
I believe it uses async internally.
And so it was like we were starting a tokio runtime within a tokio runtime.
And like that would just error, right?
And so, so it was like, we were trying to take some code that wasn't async and
use it in an async context.
And it was just kind of the kind of like buttheads a little bit so you do run
into stuff like that by buying into async but but in general i think the ergonomics
of it are actually quite good and like i i don't think we pay much of a high
cost for using it and hopefully we'll get more and more out of it over time
is sort of is a little bit of how i think about it yeah.
Exactly so i would assume that the majority of the problems with async cross
that you mentioned they don't really,
affect you because you don't think
libraries first your user facing interface is the cli but if you were.
To build.
A library and it used tokio as a sort of public interface so it was async to
begin with then that can cause some headaches with integration.
Because 100% Yeah, sorry. This is actually a great point. That's actually maybe
the thing that's most painful about using async is like there are all these
crates that we depend on that we have to depend on basically like the async
versions of those crates or the async interfaces.
And if I look at all those crates, they have to actually maintain like a tokio
interface and a sync interface and maybe like an async standard interface like sometimes.
And for example, like we have...
Tar rs okay really popular common
crate for creating and untarring
creating tar balls and untarring them and like we we needed like we wanted like
an async tar implementation so like it turns out there's something called i'm
going to get the exact names of these things wrong because they're all so similar
but there's something called async tar but that which is an async.
Standard port of tar rs it's meant to be they took
tar rs and made it async with async standard well we can't
really use that because we're using tokio and not async standard and we
don't want to have these two huge dependencies on different
async runtimes okay so that it turns out
that got forked to something called tokio tar
so it went from tar rs to async
tar to tokio tar and then that crate
actually got forked like two more times um just
by different people because it wasn't really maintained and then eventually
we forked it ourselves to fix
some bugs and now we like fully maintain that like
we just like maintain it it's a that's actually a public crate that we published outside
of uv where we've like we upstreamed or
we downstreamed i guess a lot of things from tar rs and
we like fixed some bad bugs that users were hitting
and so the whole like the ecosystem problem
i actually have no idea how to solve that which is like it
seems like a huge pain for crates to have to maintain all
these different interfaces and then for us it's like
we have two different zip crates we we use zip
rs and then we also use async zip because
we have like slightly different contexts in which they run and like that's maybe
like an us problem but like it it is that is the i think probably the most challenging
piece is just the touch points with the rest of the ecosystem when you want
to pull in async versions of things or people have to maintain async versions
of things like like okay we're going to maintain like async.
This async tar crate forever because like the tar rest crate is sync like that's
like a that's kind of a bad outcome but i don't have no idea how to solve it
i don't know if that's what you're getting at you're probably talking about
it more from the library maintaining perspective of having to expose tokio interfaces
like a lot of crates will have a tokio feature right that like pulls that stuff
in but yeah it's like it's a little painful i.
Don't know what's the way out of it here because in reality
well the ecosystem is still evolving async std is sort of that so that's off
the table but i do believe that there's merit in having a sync interface and
then an async interface on top of that which is a separate crate and the as
the crate ecosystem allows for it but yeah.
Yeah yeah yeah i actually did once maybe just to illustrate how confusing this
stuff is and how little I know about it.
Like early on, I did actually pull in a crate that used async as standard.
And I wasn't really thinking about it very hard. And I just saw I had like an
async interface and I needed to call like .compat on like a couple of things
or something. And I was like, okay, cool, like this works.
And then like, obviously like massively bloated our dependency graph and everything.
And people on my team were like, did you do this intentionally?
And I was like, I don't even really know what the difference,
like at that point in time, I was like, I don't really know like what the difference
is between these things.
They're just async, right? Blah, blah, blah. So like, it's super confusing.
It is helpful that things have centralized more on tokio, I think.
But it is it's a very easy like mistake to make and it's like not all clear
how these things relate to one another honestly.
Yeah yeah it had a ripple effect on the ecosystem which we still deal with today
but it's a step into the right direction that the futures trade is now in the
rust prelude so it feels like as you say things are progressing Right.
Now, another thing that I wanted to touch on, which also kind of is interesting
because you diverge from the norm a bit, is parser generators.
Oh, yeah.
I think you decided to switch to a handwritten parser in ruff.
And I wonder, first off, what was the decision process there?
And second, how do you handle the complexity that comes with it?
Yeah, great question. So like, originally, the parser in ruff came from a project
called Rust Python, which is, I guess, in some ways, like an even more ambitious
project, because it's a whole Python interpreter.
So they're actually trying to build like in Rust, they're trying to build a
run, you know, a whole runtime.
So like, as part of that, though, it has to parse code. And so we took that
I took that parser, and we depended on it as a library.
And that parser was based on a parser generator called Lollerpop or L-A-L-R-P-O-P.
Again, I don't really know how to pronounce anything because I spend all my
time just on the internet.
Same. But it's basically like you create a Lollerpop file.
It's a DSL, but it also can include Rust code verbatim at different points.
So you have something that kind of looks like the grammar.
But the thing that we were finding, when we started using that,
the first, I guess, sign of trouble was that Rust
Python didn't support several syntax
features in python new feature newish newish features couple last couple years
like match statements python has support for pattern matching and that was added
in 3.10 i think and it didn't support it and i was like okay and it turns out
that that's because in that in python 3.
Oh my god i'm gonna mess this up i guess i think in 3.9 maybe they switched
their parser so they moved from an lr1 this is not important if you know what
these things are but they moved from an LR1 parser to a PEG parser, a PEG parser.
And basically it meant that the grammar got more flexible.
So they were able to have things that would be called soft keywords.
So for example, match is a valid variable name.
You could do match equals one, but it's also a keyword because you can do match
object, colon, and then patterns, right? So it's both a valid variable name and a keyword.
And whether it's a keyword depends on the context around it.
So it depends, like the parser has to be able to support both those use cases.
I'm just thinking, I think async, now I can't remember.
Async might be a soft keyword as well, but largely it's for things that were
added where they don't want to actually,
they don't necessarily want to make changes that are backwards incompatible
in the sense that there's a lot of existing Python code that needs to run on
new versions of Python that might use match as a variable. And so that code
should continue to work.
So the grammar got more complicated and they added a more powerful parser to support that.
Now our parser that we were using based on lollipop couldn't support that there were like,
ambiguities in the grammar that were very hard to represent in lollipop
and we had to get increasingly good at lollipop in order to do it so like in
terms of like the precedence of the statements the way that you do like like
basically we were just like learning this tool really deeply and having to put
more and more work into actually supporting the Python grammar.
So what it felt like at the time was there were a few properties that we wanted,
that we thought we could get out of a new parser.
One was we thought we could make it a lot faster to start just out of the box.
Two, we thought it would be much easier for us to optimize further because it's
very hard to optimize like a parser generator.
Like the code is generated, right? As it sounds. And so like,
you can't like, I mean, there's only so much you can do to like optimize the
generated code, right? It's like kind of out of your control.
And the third was error recovery. We wanted like much better control over error
recovery so that, especially because we're building tooling that's designed to work in an editor.
So like if you type a syntax error, like if you type.
Def space and you're like starting to type a function we still
want to be able to parse as much of the file as we
can even though it's not syntactically valid anymore and
that's that's like pretty hard like there are
probably parser generators that support that to different degrees but
like a lot of that is requires fairly bespoke
handling of like what happens when you hit an error and
like what the different fallback cases could be so there
were things we wanted out of a parser generator which were like performance
and error recovery and then we also found that we were spending a lot of time
just trying to get the part the parser generous was a save you time but ultimately
we were spending a lot of time trying to get it to work for the grammar in the
first place so we were like we want to rewrite the parser we were like pretty sure about that,
and actually a contributor came to us and said that they were working on they
wanted to do write like a handwritten parser if i recall correctly i think it
was actually for like a master's project.
And they were like would you ever like use it in ruff if I can get it to work
with the ruff AST like I'm building around the ruff AST sorry they were building
around the ruff AST and they were like if I can get it to pass the ruff test
suite would you want to use it in ruff and we were like yeah absolutely and
we'll pay you for it and so we like,
there was a fair amount of so this contributor like brought us this parser and
there's a fair amount of kind of like last mile work of making sure that.
You know, because we're being used by all these big projects,
we need to make sure shipping a parser is like a huge change.
So we did end up investing a lot of time in like the final 10% of like making
sure that it works in all cases and that it won't panic for people and that
it will work exactly as we expect.
It's the kind of thing that it's pretty easy to test at very large scale because
we just, we can compare our basic, for example, the diagnostics that we get on very large projects.
There are arbitrary large projects and we can run ruff before and
after on those projects and make sure the diagnostics are completely unchanged
for example so we did a lot of large-scale testing and it
was actually like an incredibly smooth release like we
didn't i don't think we got like we got like maybe one bug report which
is amazing i didn't even work on this so i can brag about it a lot
but like it's amazing that that happened and ultimately
it means we now have our own parser it's totally it's completely handwritten and
it's been way easier for us to modify it over time because
like it it is work for sure like
the grammar gets extended but it we have complete control
over like how the parser works and like the idea
of adding new syntax to that grammar is so much
less daunting than adding it to like the parser generator which
isn't it's really not meant to be a knock on the parser generator like
i think parse i think like that's a great project i think parser generators
can definitely be great it just depends on what you're doing but for
us like python there's just a lot in the grammar there are lots of ambiguities
and it's evolving and it's something that we have to change so we felt more
comfortable doing it ourselves and it was also like way faster like it sped
up all of ruff by like 20 to 40 percent or something that's.
Pretty impressive and it sort of paid off for you to take ownership of this
entire parsing part because it's such an integral part of what you're building
at astral in general you can probably use that for uv and for ruff i'm assuming i don't know.
We don't use it in uv today the python parser we don't use it in uv today but
we could we have other parsers in uv thankfully like the version parser the
version specifier parser the requirements txt parser yeah a lot of the work that you do is.
Parsing right i.
Guess so yeah i mean often like we need to implement standards things
that have been specified in python but only have really have python implementations
so like versions would be a good example like sounds like a simple thing like
1.0.0 right but like they get actually fairly complicated and so like it's not
like that parser is incredibly complicated but,
It does run, I don't know, it probably, we're probably analyzing,
I have no idea what the number would be.
We're probably analyzing billions and billions of versions a day,
right? Think about how many times that version parser is parsing versions, right?
Like, so, you know, we think a lot about how do we make that fast and how do
we also, how do we make sure that it's fully standard compliant?
So yeah, we do build a lot of parsers.
To me that was probably one
of the highlights of your rust last year where
you shared that story about version
parsing i had such a good laugh so if someone
hasn't seen it yet we will link it in the show notes it's that's
a really fun example really great yeah it was such a fun
exercise and i think you could make an entire course
around that but anyhow probably a phd but yeah that's
another topic still it's different
when you build that as a fun hobby side
project or when you do that as a business with multiple employees with a larger
code base with multiple crates and so on so i would just wanted to take the
opportunity to talk about day two with rust as it stands today what's the verdict i.
Love it and i think it was i think it's been such a good choice for our projects
so we get to we can build extremely stable extremely fast software,
that also has the benefit of being memory safe for like you know thousands and
thousands or millions whatever of like python users and the day-to-day is excellent
like rust i've always thought that rust kind of like,
I don't know. It's not that much of a secret, but like the secret behind Rust
is I've always thought is like the tooling.
Like I, for me at least, and I've never really written any C++ and people can
also just think I'm a moron, which is fine.
But like, whenever I look at a C++ project, it's like, it's super intimidating
to figure out how do I even get this thing to build or run or like, what am I doing?
And I don't know that I ever would have become a quote unquote systems programmer
through c++ or i think it would have been a lot harder for me that's my prediction
maybe i'm wrong i don't know but like rust it's just such a high confidence experience it's like.
You install this thing with rust up you run cargo run cargo
build and like that's how you build a project like i can clone any
rust project and like feel fairly confident that i know
how to run it and how to test it and how to build it and
how to understand it and so for me like the tooling
story is excellent i i think the only
things i really have complaints about and these are just like things that people are
always going to that i'm always going to complain about no matter what is like
obviously i'd like compile times to be faster that would be
nice because as you build a bigger and bigger project it becomes
more and more of an issue and it just
it's just like a tax on development but you know
even if they were faster i'd probably be saying the same thing and saying
i just want them to be faster like um that's that's
like the main thing for me i think the thing one thing i'm really impressed by in.
Rust too is just the rate of
the rate at which the language and the tooling keeps improving
like every rust release has something
i want and something i've been waiting for which is
kind of crazy like just even looking at and i'm
not even i'm not one of the like i talked to there are people on
our team who have been doing rust since before 1.0 right and i've been doing
rust for a long time and like i started doing rust in like
2022 is when i started writing
rust i think wow yeah i've like i haven't been writing rust for
that long but like now i'm like every rust release like
there's just like there's just so much progress and it's
like awesome to feel like you're part of this community that's like building and
growing all the time so it's both it feels so stable and like so mature but
it's also the rate of change i think is like great and the rate of progress
and the rate at which things are being addressed my only complaint is compile
times but i you know i i'll take what i can get really same.
Let's let's make it faster definitely.
Yeah we.
Need more crates in the workspaces but when you when you refactor across,
crates or maybe even within crates how does it feel to you like what's that
experience like like do you refactor with confidence is it something that you
look forward to is it a choice refactor is it more of a you know dread.
No i like refactoring and rest a lot i think it's not like,
Maybe it's the nature of what we're building. It's, you know,
like, I guess the thing that people often say about like very well typed languages
or even like functional languages is like, if it compiles, you know,
it, you know, it will work.
I don't know if I believe that about Rust.
There's still ways that your pro code
can be wrong, but I do feel like I'm constantly guided by the compiler.
And actually more and more, I think the way I write code is that I try and make
it such that I will be guided by the compiler in the future.
So like for example i think about this
with like okay let's say i'm passing like a
struct into a function and like i need to do
something with every field i want
to make sure that if i add a new field to that to that struct i'm reminded
that i need to handle that field in that function and so
the thing i will often destructure it because that makes
sure that if i add it as opposed to referencing all the fields in line like
object dot a object dot b i will destructure it at the top of the function it's
a very small thing but it means that the compiler will tell you will remind
you that you need to look at it if you add a field so like more and more i find
myself trying to find ways to be guided by the compiler because it's such a powerful thing,
and it just makes it it just makes progress i just think it makes like working
on large complex products like so much easier the.
Destructuring part i never heard in such detail because yeah.
That's kind of a silly little thing but do you know you don't understand what i'm saying right no.
I totally like it i will totally steal that idea i like i.
Mean hopefully you don't have structs that are that complicated that need that
that much but like you know it is i just like more and more try to think about
like how will the compiler make sure that i do this correctly in the future yeah.
I know about that tip in a serialization context when for example you you want
to ensure that all the fields get serialized and deserialized properly then i.
Know some.
People to structure that but i it never crossed my mind that you can do that
in just you know normal function code you could even destructure it right there
in your arguments you don't.
Even have.
To do it in the first line of the body you can do it in the.
Arguments because it's just a you.
Know pattern match essentially or like it's a destructuring pattern yes.
Yeah do.
You have more such tips like where can people learn more about ideomatic rust
and best practices where did you learn it.
I mean a lot of it i learned from having great teammates
which is not that's sort of a bad answer because like not ever it just depends
on your life situation like you don't have that much control over that you have
some control but like but it is a real thing which is i started working on ruff
on my own and then as we grew the team i ended up hiring thankfully people who
knew a lot more about rust than me and like,
Like Mika, our team, who was the second employee to join the company,
he just taught me so much about Rust.
And then later we hired Andrew Gallant, BurntSushi, who I will often just send
him random Rust questions rather than Googling them.
I mean, not in a way that is exploitative of that relationship or overly burdensome
on me, but he loves being the elder statesman at the company that can help people
with hard Rust questions or problems.
And so finding great people to learn from is maybe the slightly higher level
lesson, but like I know that's not always easy.
The other thing that I did is I read a lot of code like.
ChatGPT and LLMs are great, but like you should also remember that like GitHub
CodeSearch exists and like all these amazing code bases are open source and so for example,
something I'll often do now is like if I look at a crate and I want to use it
and it doesn't have, maybe it has great examples.
Okay, great. if it doesn't have great examples and I don't feel the need to,
I don't want to read all the documentation myself, I will actually just go into
GitHub code search and I will just search for the struct name or the function name.
And I will go find real examples in one second of real projects using that crate.
And so like, you can just go read code, you know, like, like reading code is
like, you know, all this stuff is accessible to some degree.
And so that's, I don't know, that's how i've tried to pick up
things and and at least learn like i looked at
cargo a lot when we were building uv and tried to understand like how do
they do certain things like i don't know how to implement git like let me go
look at what cargo does and then let me let me read about their design decisions
because it's all documented in their prs and you can understand the trade-offs
and you can understand like why they did things a certain way so you know you
can also go hunt people down and talk to them about this stuff but there's plenty
that you can find without doing that.
Fully agree before there were llms there was BurntSushi but unfortunately you hired
him so there's just one of one of them but you can still read their open source
code so ripgrep whenever someone tells me what is an idiomatic rust crate that
i should read i always point them to ripgrep because.
Oh yeah i liked that a lot too when we were also when we were figuring out how
to structure our crates and how to manage workspaces and our release pipeline
and all that stuff there's just so much good there's just so much good code
out there and so you know go read it.
Yeah I always cry
when I open the ripgrep code out of joy of course I.
Really like reading this it's amazing.
Yeah, unfortunately, we have to come to an end. But I wonder if you have any
final statement to the Rust community.
Yeah, I mean, I think for me, like I, how do I, how do I say this correctly?
Like I, it's kind of amazing, I think that I only started writing Rust like a few years ago.
And now we've shipped, I mean, along with a great team, like we've shipped two
of these tools that are having,
I think at least a huge impact on Python, which
is like the most popular or the second most popular programming ecosystem
on earth so if you think about it like rust is
kind of in a lot of ways rust is kind of like powering python
at least to you know if i have a say about it at least
rust is powering is powering python and so
i don't know i've always just felt i again i never considered myself to be like
a systems programmer quote unquote in most of my career i was writing typescript
python i mean i did some java professionally but like i had never except for
like a course in college done any C. I really hadn't done any C++.
And like in the span of a few years, I like learned to build this kind of software.
So I don't know. I've had just like great experiences with the community and
being welcomed into it and learning the language.
And I think that should continue to be a very important part of Rust is like
welcoming people in and helping them learn.
Because the impact that we can
have by building this kind of stuff is just huge, even outside of Rust.
Perfect i couldn't have said it better both languages are really close to my
heart and i really like to see that synergy jolly your presence was much appreciated
i thank you so much for taking the time today yeah.
Thank you so much for having me and for for all the great questions i it was
it was uh it was really fun.
Rust in Production is a podcast by corrode it is hosted by me Matthias Endler
and produced by Simon Brüggen for show notes transcripts and to To learn more
about how we can help your company make the most of Rust, visit corrode.dev.
Thanks for listening to Rust in Production.
Charlie
00:00:26
Matthias
00:01:47
Charlie
00:01:56
Matthias
00:01:57
Charlie
00:02:08
Matthias
00:05:27
Charlie
00:05:58
Matthias
00:10:34
Charlie
00:10:48
Matthias
00:10:55
Charlie
00:11:15
Matthias
00:12:39
Charlie
00:13:21
Matthias
00:16:04
Charlie
00:16:16
Matthias
00:17:49
Charlie
00:18:18
Matthias
00:18:20
Charlie
00:18:32
Matthias
00:21:16
Charlie
00:21:44
Matthias
00:22:09
Charlie
00:22:15
Matthias
00:22:19
Charlie
00:22:26
Matthias
00:23:58
Charlie
00:24:01
Matthias
00:24:02
Charlie
00:24:08
Matthias
00:26:51
Charlie
00:27:34
Matthias
00:29:28
Charlie
00:29:31
Matthias
00:30:14
Charlie
00:30:40
Matthias
00:30:41
Charlie
00:30:49
Matthias
00:35:24
Charlie
00:35:35
Matthias
00:36:01
Charlie
00:36:10
Matthias
00:36:55
Charlie
00:36:56
Matthias
00:37:50
Charlie
00:38:01
Matthias
00:38:05
Charlie
00:38:07
Matthias
00:38:35
Charlie
00:38:39
Matthias
00:38:42
Charlie
00:38:50
Matthias
00:39:31
Charlie
00:39:43
Matthias
00:41:00
Charlie
00:41:06
Matthias
00:41:07
Charlie
00:41:08
Matthias
00:42:54
Charlie
00:43:14
Matthias
00:47:11
Charlie
00:47:46
Matthias
00:49:58
Charlie
00:50:19
Matthias
00:50:20
Charlie
00:50:34
Matthias
00:53:16
Charlie
00:53:39
Matthias
00:54:29
Charlie
00:54:59
Matthias
00:55:00
Charlie
00:55:15
Matthias
01:02:04
Charlie
01:02:22
Matthias
01:02:37
Charlie
01:02:39
Matthias
01:03:24
Charlie
01:04:12
Matthias
01:06:57
Charlie
01:06:59
Matthias
01:07:00
Charlie
01:07:18
Matthias
01:08:43
Charlie
01:08:48
Matthias
01:08:52
Charlie
01:08:55
Matthias
01:09:05
Charlie
01:09:15
Matthias
01:09:16
Charlie
01:09:26
Matthias
01:09:26
Charlie
01:09:29
Matthias
01:09:30
Charlie
01:09:36
Matthias
01:09:38
Charlie
01:09:45
Matthias
01:12:01
Charlie
01:12:18
Matthias
01:12:30
Charlie
01:12:47
Matthias
01:14:06
Charlie
01:14:20
Matthias
01:14:24