Rust in Production

Matthias Endler

uv with Charlie Marsh

About improving the Python ecosystem with Rust

2025-05-15 75 min

Description & Show Notes

Up until a few years ago, Python tooling was a nightmare: basic tasks like installing packages or managing Python versions was a pain. The tools were brittle and did not work well together, mired in a swamp of underspecified implementation defined behaviour.

Then, apparently suddenly, but in reality backed by years of ongoing work on formal interoperability specifications, we saw a renaissance of new ideas in the Python ecosystem. It started with Poetry and pipx and continued with tooling written in Rust like rye, which later got incorporated into Astral
Astral in particular contributed a very important piece to the puzzle: uv – an extremely fast Python package and project manager that supersedes all previous attempts; For example, it is 10x-100x faster than pip. 
 
In this episode I talk to Charlie Marsh, the Founder and CEO of Astral. We talk about Astral’s mission and how Rust plays an important role in it.

Up until a few years ago, Python tooling was a nightmare: basic tasks like installing packages or managing Python versions was a pain. The tools were brittle and did not work well together.

Then, suddenly, we saw a renaissance of new ideas in the Python ecosystem. It started with Poetry and pipx and continued with tooling written in Rust like rye, which later got incorporated into Astral.
Astral in particular contributed a very important piece to the puzzle: uv -- an extremely fast Python package and project manager that supersedes all previous attempts; For example, it is 10x-100x faster than pip.

In this episode I talk to Charlie Marsh, the Founder and CEO of Astral. We talk about Astral's mission and how Rust plays an important role in it.

About Astral

Astral is a company that builds tools for Python developers. What sounds simple is actually a very complex problem: Python's ecosystem is huge, but fragmented and often incompatible. Astral’s mission is to make the Python ecosystem more productive by building high-performance developer tools, starting with Ruff. In their words: "Fast, unified, futuristic."

About Charlie Marsh

Charlie is a long-time open source developer and entrepreneur. He has an impressive CV, graduating with highest honors from Princeton University. After that, he worked at Khan Academy and others before eventually founding Astral in '22. Charlie is an engaging speaker and a great communicator.

Proudly Supported by CodeCrafters

CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. 
Start for free today and enjoy 40% off any paid plan by using this link

Links From The Episode


Official Links

Transcript

It's time for another episode of Rust in Production, a podcast about companies who use Rust to shape the future of infrastructure. I'm Matthias Endler from corrode and today's guest is Charlie Marsh from Astral. We talk about improving the Python ecosystem with Rust. Charlie, thanks for being a guest. Can you say a few words about yourself and about Astral, the company you work for?
Charlie
00:00:26
Yeah, of course. Thanks so much for having me. first of all. My name is Charlie Marsh. I run a company called Astral. We build high-performance developer tools for the Python ecosystem. So we're best known for two tools, RUF, which is a sort of combined linter formatter code transformation tool. You could think of it like kind of Rust format and Clippy together, which is again for Python code, but written in Rust. And then UV, which is our project manager, Python package manager, Python tool chain manager. It's, it's, it's, you could think of it a little bit like cargo and Rust up. So it tries to bootstrap or will bootstrap Python for you and then help you manage your dependencies, install things, lock them into lock files and reproducible versions and all that kind of stuff. So yeah, everything we've built so far is open source and written in Rust. We're a team of about 15 people spread from the Pacific, like Pacific time in the U.S. We have people in pacific time central time eastern time and we have one person in the uk we have four people in cet or like germany switzerland the netherlands and then one person in india so not only are we like remote we're like very distributed kind of like a lot of open source and we spend all our time basically writing rust to try and make python better i.
Matthias
00:01:47
Heard about astral the first time when you published ruff but it took off with uv i would say like.
Charlie
00:01:56
There was.
Matthias
00:01:57
Definitely a huge sympathy for what you do in the packaging space as well did you see that as well from a company perspective that there was a huge growth.
Charlie
00:02:08
I think so. Yeah. I mean, I think, you know, we started with Ruff. I started working on Ruff before the company existed. And it was just something I was building. Well, for a lot of reasons, but largely because I saw these were problems that I had experienced in my own projects. And I was like, what would it be like if I wrote, you know, Python tooling and Rust instead? And that was sort of the genesis of Ruff. and ruff grew extremely quickly like we had you know decades old projects like scipy and stuff like adopting it while it was still like what i would consider to be very unstable so it was clear that people wanted you know something in this in this arena of faster python tooling so ruff grew very fast and you know we were we were seeing what was happening with ruff and we were like well. As a company and in terms of like the problems we're trying to solve like we're not just we're not trying to build just that like we want to build a python tool chain effectively like we want to solve the hard problems in python and for me like packaging was kind of like if you want to be like the python tooling company which aspirational i guess we'd like to be then like you have to work on packaging i think because it's it's the thing that everyone has trouble with and the thing that has been so many people have tried to solve and and done good work on but but i don't think anyone would have would consider it solved and so you know i i kind of knew from early on that we wanted to do something in packaging and before we released it i definitely felt nervous in that ruff had grown well and people really liked it and it was it was kind of felt like it was going to be a tough act to follow like we don't want to be a one-hit wonder you know with the tools that we built. Like I wanted to build something that hopefully would be as exciting and grow as well and have as much of an impact as Ruff. And I think UV has actually really surpassed that in a lot of ways. It's like, I think the impact it's had on Python has probably been more significant. And I think like the, not that I don't love Ruff. I mean, it was my first big project and we still like half the people on the team are still working, are working on Ruff and the static analysis tooling. It's like a big focus area for us but i think i've been really amazed by how quickly uv has had the impact that it's had like we released it in february of last year so it's just it just is one turned one year old and in that time it's grown to like. I think last I checked, it's like a little over 12.5% of all requests to the Python index come from UV, which is like over 200 million requests a day, which is wild, right? And I spend a lot of my time talking to companies. There are tons of like 1 billion, 10 billion, even $100 billion companies that are using this thing in production and have been for a long time. So just like the way it grew i think really surpassed my my expectations and i feel like we now have a a platform to like keep hopefully making python better but but i think you're right that like uv i think we shifted into a little bit of another gear we spread beyond just being like the company was just ruff before and now we are like python tooling and we try to solve like a much wider surface area of problems now.
Matthias
00:05:27
You said that ruff was your first major rust project maybe the first actual project that you tried after like learning rust i'm not sure about this but it certainly was big milestone for you in your rust journey but then you compared it with uv and did you already learn a few things that you wanted to avoid with uv that you did with ruff and also did you set up the uv code base early on for such growth.
Charlie
00:05:58
Yeah so So I would say I really learned Rust in the process of writing Rust. I had done some Rust at my previous job, but someone else on the team introduced Rust, really great engineer. And when I was contributing to that code base, I was mostly trying to get in and out as quickly as possible because I didn't really know Rust. I was like, I need to fix a bug. How do I get this to compile? I wasn't really taking the time to learn it. I was just using it when I had to. So I had some Rust exposure. But part of why I worked on ruff in the first place was I felt like I wanted to really learn rust. And I thought that in order to do that, I had to build something from scratch. And I had to like, it's just kind of the way that I like to learn. I was like, I need to build something. And like, even if that requires like, wasting a ton of time trying to understand like lifetimes, like maybe I'll burn like two days trying to get this to compile. But like, that's kind of how I learn is just like, I have to build things and like fight through the problems. And I wanted to learn the ecosystem and just the tooling and everything. So I did really learn Rust in the process of writing ruff. I think that showed, like there was probably still shows a little bit today, like, you know, especially early on, there were a lot of things I was doing that were just that I would now, I mean, this is, this is part of personal growth, right? There's lots of things I was doing that I would now look at and laugh about and say, that's obviously not the right way to do it. Whether it was like silly performance things or the way I was structuring the code or or I don't know, anything. By the time that we started working on UV, I think I felt a lot more comfortable in Rust. And I'd also learned more from looking at other projects too, I think. And we had kind of evolved Ruff a little bit over time. So for example, like we have a really, a fairly wide like crate structure. Like we just create a lot of crates. Like if you go and open up Ruff or UV, the crate subdirectory, there's like in UV, there's probably. There's at least 20 crates, maybe like 30. So we just create like lots of crates. And that was something that I started to do in ruff because our, our build and compile times got really bad. And it became like, if I just put everything in one crate, it was like our build and compile times got worse and worse. And the development loop was painful. And, you know, at the time I was reading a lot about how sort of like crates are like the atomic unit for like Rust-C and like you can parallelize across crates and you need to be thinking about like how your crate graph looks so that there's a lot of like fan out and stuff and i was like okay we're going to start creating like lots more smaller crates and we're going to sort of change the atomic unit of like what the code base is from like the module to the crate and so we started carving out like lots of little crates and we did basically the same thing and for example in ruff like you know. The core linter crate like doesn't depend on clap for example like it doesn't have it doesn't depend on any of the cli stuff the cli stuff is in a separate crate that depends on the linter crate and you just try and get this structure in this organization or like the parser and the ast those are all their own crates so we can if you just need to test the parser right it's really fast to compile and build or if people if other people even just need to pull in the parser as a library which some people do it's much easier so by the time we got to uv i think we had ironed out a lot of things that made it so we put we just put a lot of stuff in place from the start like we just had a better separation in terms of what went into what crates i think we'd ironed out a lot of our workflows for example like what's our clippy configuration how do we do how do we pin the rust toolchain version like how do we install rust in ci like there were just all these little things that we were able to just copy over from ruff and i think that made things a lot easier. At the same time, there were still a lot of design decisions in UV architecturally that were pretty different because the design of a package manager and the design of the linter are just the things they need to do are very different. So, for example, UV has a whole networking stack, right? It needs to make lots of HTTP requests and the linter doesn't need to do any of that. So suddenly we had to think about requests or basically the whole HTTP stack. We had to think about like open ssl we had to think about all these system dependencies like git that was another two like we support git dependencies so suddenly you have to think about how you depend on git i think there was just a lot of complexity that came with building a package manager that didn't we didn't necessarily have to encounter when we were working on the linter, which is good because i kind of like encountered those problems over time when.
Matthias
00:10:34
I understand you correctly to summarize you didn't have a rough start with uv because. You could copy over. A lot of the let's say templates or a lot of the best practices from the previous project.
Charlie
00:10:48
At least what we thought were best practices yeah and i guess to some degree we still think our best practices but when.
Matthias
00:10:55
You used uv when you started with uv did you already compartmentalize some of the functionality into smaller crates somewhat subconsciously did you say oh yeah this definitely goes into a separate crate now or did you still start with a single large crate.
Charlie
00:11:15
No i think we were like compartmentalizing stuff basically from the start and for example like you know we had one crate for like parsing python versions there's a whole spec around versioning that requires, a lot of there's a lot of intricacies to it so that's its own crate we had one crate for like, parsing requirements txt files that's another thing that's common in python that's there's actually no spec for that that's like implementation defined and so we have like one crate for that you know we have a crate for creating virtual environments we had a crate for the cli we had a crate for the resolver so yeah we broke things down i think pretty early on and that i mean that's generally served us very well so i'm a big fan of that structure i think it's like much the trade-off is really different when you're publishing a library like we don't publish any of our we don't, we don't publish any of anything in rougher uv as a library like we don't publish to crates io any of the stuff that we build right now like the public api for our stuff tends to be the command line so the cli the command line interface so like if we had to publish things i think we would have to think a lot more carefully well there's all sorts of considerations that come with publishing that you don't have to think about if you aren't publishing. But one is, if we had a really granular create structure, I think it becomes a bit harder for people to use your... There's a lot more of a maintenance burden, and it's also harder for people to use and compose your things.
Matthias
00:12:39
But you don't go and... Introduce a crate for types specifically because that's one thing that i see some, some people do some companies do that they have a types crate where they put everything that is related to you know their basic types into one thing and then at the end they don't really get a lot of benefits from using a workspace because you need that types crate anyway in all of your other crates and so it kind of goes against to what a workspace is about in my opinion but i want to hear it from you like what are some anti-patterns for building workspaces.
Charlie
00:13:21
Yeah so like i guess yeah we try not to do things like like that i don't know what we would call that pattern but like create a crate that's just kind of like used everywhere there's maybe some of that but we try to generally avoid it i think the other anti-patterns are like you can make your crate structure like too granular like if you find yourself creating a crate for that's like one function and sometimes you'll find yourself doing that because you have like circular dependencies or something in the dependency chain that's requiring you to do this like i have two crates that need this functionality and they can't depend on one can't depend on the other or something and then you find yourself putting a function into like like it's just a crate that's like a function at that point you've probably gone too granular and you kind of need to rethink like the organization because there is some overhead to having all these crates and like, you know it's not always like super fun to maintain i think a couple things that we do though that are kind of nice for this is like we prefix all the crates in the workspace with uv or with ruff so like it's easy to differentiate what's us create in the workspace versus what's a third party crate like all of our crates are like uv virtual env or like uv resolver or like whatever else that's pretty nice and then the other thing is we. We declare them all in the workspace root so that every crate that depends on other crates in the workspace can just use workspace equals true. This is a little bit nuanced and maybe hard to visualize, but it basically means that in the cargo.toml for all the crates that are in the workspace, it's very obvious what else is in the workspace. Another thing that we do is we actually try to use workspace equals true for all basically all dependencies in the workspace so we put everything that we depend on in the root cargo toml like the workspace root and then we use workspace equals true everywhere and that tends to simplify things a lot it basically means we have like one dependency specifier for request for example and a common set of like default extras or no default extras or sorry not extras features extras is the python version is the python analog to to features but like no default features or you know or whatever else we have like one dependency declaration for everything that we need and then all the workspace members their cargo terminals are just very straightforward it's like the things they require with workspace equals true so we don't have to think about oh do we have request dependency specifiers across like 10 or 15 different crates we just have one definition for it so like that's also i think been a really handy thing right.
Matthias
00:16:04
Workspaces seem to be one of these things that i keep missing in other ecosystems and i can't remember if there's even another language that has such a feature from the top.
Charlie
00:16:16
Of my head yeah think about one i mean we've thought about it a fair amount in python so like uv has a has workspace functionality that's like very much modeled after cargo um so you have a root it defines the members we use very similar actually like members and excludes syntax and there's nice things like that you know we you can do like uv run dash p just like you can do cargo run right so like we've we've just like copied a lot of things from cargo because it's great and we're like we want to have those things in python and for workspaces i think that's worked really well it's been very interesting to introduce them because for a lot of people it's fairly new concept like when they come to python right like like most people who use uv i've never used Rust right they've never heard of a cargo workspace before and like why should they you know if they're just working in python so it's been interesting to try and like i communicate that and like help people understand like what it's for and like why you might use it and like what what an example workspace might look like in practice there are some things that we miss that are hard to get a little bit hard to get like without standards like for example. We don't support, I just spent all this time talking about this incredibly boring workspace equals true thing that all your listeners are probably like, why is he so much time talking about that? But like, we can't really do that in UV because like there isn't really a way to express that in the standards. And so, you know, there are some things that we have to kind of either can't support or have to get creative about how we support. But I think the workspace concept is excellent. And like, I'm really glad that we made it such a first class thing in UV.
Matthias
00:17:49
Now, I'm pretty sure that we could talk about workspaces for hours because there's a lot of nuance to it. And I think a lot of people that haven't tried it themselves, they just don't know what the fuss is all about. But I do believe that there's more to it than just the term. Now, one other thing that you mentioned, though, which is very close to my heart is parsing stuff, especially the requirements txt, which is unspecified, I heard. Now, isn't it super simple?
Charlie
00:18:18
Yeah isn't.
Matthias
00:18:20
It super simple like someone might hear this and say oh yeah i just opened my editor i string split every line on the equal sign and that's my requirements txt parser why is that not the case.
Charlie
00:18:32
Yeah so for requirements txt it's like this is basically this is a file format that exists for pip really pip is like the i guess sort of how should i describe it like the reference implementation for a lot of things it's really been like the python package installer for a long time yeah um and requirements txt is a file format that exists for pip and you know kind of the way that you can think about it which maybe people don't think about it this way is like it's basically like each line is a command line argument because you can not only put requirements in there you can also put command line arguments and settings yeah which is interesting so like on pip and pip you can pass an index url on the command line which is like what registry should i use for fetching packages you can actually put that in requirements txt too you can just do dash dash index url. You can also like embed like you can also nest them so like within a requirements txt you can do dash r and point to a different requirements txt and then it's sort of like it gets inlined roughly so there's a lot of complexity to the stuff that's in requirements txt and that people don't really think about. And there are also lots of very subtle behaviors where, especially for us, we often have to decide, well, what do we want to do? Do we want to be like bug for bug compatible with pip or do we want to do things like slightly different? And a lot of these are edge cases, but there's just a lot of nuance to it. Like in a requirements.txt file, you can have, I guess what I would call like a named dependency. Like you could say like flask and then the version that you want, or you could do flask at, and then the URL that you should fetch it from or like a Git repository or something like that. But you can also just do the URL or just do the Git repository. And it turns out that in PIP, there's slightly different behaviors around how those things are parsed, like the URL, if it's just the URL versus if it's after a name dependency. And how white space is handled, like error recovery, all this stuff is just a little bit different. And so over time, we've had to add, basically, the only way to know if we're doing the right thing is we just see what PIP does. and then we try to mimic that to some degree. Or if we think we can do things that are unambiguously clearer, then we'll do that. But yeah, that one's especially hard. It's a little bit easier for other things like Python version specifiers where there's a clear standard on how these things should be parsed. And if we see something that's not clear, we can actually ask about it through the standards process and be like, how should this be handled? Like blah, blah, blah, blah, blah.
Matthias
00:21:16
In essence, you make the standards stronger for everyone which is great what kind of piqued my interest there was did you have to go and read the pip source code to be able to understand what's going on because this is a standard that sort of developed over 30 years i think python is 30 years by now it's from 1994 as far as i remember so and did you really have to read the source code yes.
Charlie
00:21:44
Sometimes we definitely have to go read source code, and which is fine like i don't i don't really mind that like as long as we can figure out like why a certain behavior exists in a certain way and like what it's motivated by but yes we often have to go read the source code to understand like how does this tool handle this case and like and and also why right like a lot of the time you're trying to understand why people made certain decisions not just how it works that.
Matthias
00:22:09
Means you don't even you don't only have to read the source code you also have to read the history the.
Charlie
00:22:15
Yes version history yes but do.
Matthias
00:22:19
They have proper tests at least where you can go through and maybe implement these tests in in your implementation.
Charlie
00:22:26
Yeah they do have tests yeah it's not always like super straightforward to take their tests and move them over to uv like a lot of the times i don't know just the way that you write tests in like python and rust often tends to be quite different like or at least the way that we write tests tends to be quite different like you know like in python like mocking is very popular for example and like we don't really have we have like almost no walking in uv like almost all of our tests i think are what you would probably consider like integration tests like the whole way that we test uv is we actually run real scenarios against a real package registry and it's almost all based on snapshot testing so like we snapshot the cli output a lot of the time that's like how we detect if things are working correctly or not is we snapshot the cli output or we snapshot the lock file or we snapshot things like that but like the vast majority of the way that you get tested is. Integration tests of running real commands on the cli and snapshotting the output and that's like we have, a lot of tests and they all run through that. So it's a little different sometimes because if I go and look, we do have obviously more traditional unit tests for requirements.txt parser. That's something that's very testable, right? And we do have tests for that. But often if it's more complex scenarios around how PIP behaves versus how we behave, it's a little bit harder to shoo it into what we have.
Matthias
00:23:58
Do you always run the full test harness?
Charlie
00:24:01
In CI?
Matthias
00:24:02
Yeah, or maybe locally as well, but I wonder that's probably too much.
Charlie
00:24:08
Yeah, we do always run, we almost always run the full test harness. So like we just, we skip it for changes that are purely documentation. Which we just detect with some file filters in GitHub Actions, but otherwise we run the full test suite. Every change we run it on Linux, macOS, and Windows. And that's because we do like that that that's not as critical for ruff but it's very critical for uv because with the package manager just like more things tend to differ across platform and more behaviors differ across platform and so we always test everything on linux mac os and windows i guess one interesting piece is like we we build uv for like a lot of different targets so you know when we publish to py pi or publish anywhere or publish to github releases like we're building like 15 probably somewhere between like 15 and 20 different builds or maybe is it 15 somewhere between 10 and 20 let's just say different build variants and so that's like linux x86 you know dynamically linked against glibc linux x86 sorry linux x86 musl linux arch GWC, Linux ArchMuzzle. We build for some more obscure platforms that are supported in Python, like PowerPC and S390X and stuff like that. We build for ARM and x86 Windows, ARM and x86 Mac OS. So we just have a lot of different builds. And... I mean setting first of all setting those up to actually build is fairly complex and so if you ever find yourself needing to do that you should look at our ci because we've you know we've just figured out how to build a rust project across all those different like a lot of those are cross compiled right and we've just figured out how to like build those across lots of different machines what is our ci our ci your ci oh yeah yeah just go look at our.github folder and like take stuff from it um but the other piece is we actually like we don't we rerun all of those builds whenever certain files change so like if we add a new dependency we rebuild across all those machines which is fairly because we've otherwise we've run into situations where we add a new dependency and then we go to release and then i don't know the arm musl build fails for some reason that we don't or the windows arm build fails for something some reason that we don't fully understand and so now we run those that i guess the only nuance is we we do rebuild everything on every platform if we for example change our dependencies yeah.
Matthias
00:26:51
I i think this is kind of where the rust build system also has its limits at this point in time because especially when you talk about like feature flags and maybe optional dependencies dev dependencies there's a lot of, loose ends and sometimes you can't really specifically and exactly say which dependencies you want to enable for which platform and you know on top of it you have various system dependencies and the buildrs files to go with it and sometimes you don't want to have a system dependency on a certain target and so on. And it's very hard to work around these limitations.
Charlie
00:27:34
Yeah. I mean, a lot of the complexity has come from things like accelerators, sorry, not accelerators, allocators. Like, you know, like we use, I don't even know how to pronounce these things, jemallo, right? Like on most platforms, but like it just like doesn't work on like, it like doesn't compile on like some platform. So we have to have like a bunch of build configuration around that. The other one that caused a lot of trouble is. Was our like zlib implementation like we wanted to use which is like for decompression like we wanted to use so like by default you get this pure rust minis oxide implementation if you just use like like flate and like reqwest like if you just use like the reqwest crate to like decompress stuff you get like this pure rust implementation by default i think but there's like this uh zlib-ng version that you can use which is a lot faster but it requires like it adds like a CMake dependency and it needs to be built it's like very hard to build and so we could never get that to build on certain platforms and so we had to have configuration around that around like where do we enable it and where do we not we actually totally tore that out recently and moved over to zlib-rs like the pure rust implementation of a lot of these zlib optimizations which in our benchmarking at least is actually both faster and it's like way easier to build so when you get something that's pure rust like it's just like it was it's simplified things dramatically we just like tore out this like CMake dependency and like all this stuff and sudden and got rid of all this configuration because now we can just use like the faster easier to build more portable like rust version on all platforms so a lot of it comes from you know if you're trying to do things that are more like customized or like bleeding edge like that like you use really fast system dependencies or like allocators like you end up running into these kind of configuration. Problems but we've we've we've tried to make that simpler over time must.
Matthias
00:29:28
Have been such a relief to pull out that cmake.
Charlie
00:29:31
Oh it's amazing it's amazing i actually tried to do it like anyway sort of like sorry i tried to do it like a few months ago and then we realized that they were using compile time feature detection for a lot of the optimizations which isn't great for us because it meant that like on x86 things were actually a lot slower because we have to build for like a common cpu target like and they recently re-released with like runtime detection and i was like okay cool we're doing it again so i like i i redid the change like twice and and like redid all the benchmarking on like my windows machine and on like on mac os for arm and all that kind of stuff so yeah but like what yeah when you can get rid of stuff like that it's like immensely helpful i realized whenever i.
Matthias
00:30:14
Read one of your posts or whenever i see you talk i realized that you care a lot about performance and you benchmark a lot and meticulously and do you have any any tips for people that want to make their rust code even faster from like doing it so often there must have been patterns that evolved over time what is a.
Charlie
00:30:40
Good benchmark.
Matthias
00:30:41
What's a bad benchmark what are some good libraries out there in general what are best practices what don't you measure for example that sort.
Charlie
00:30:49
Of yeah there's like so much to share i mean yeah, It's like a whole, actually like learning the tooling too around benchmarking is like itself a skill. And like, we have people like, there are certain things that I only know how to do on Mac OS and I don't really know how to do on like Linux or windows. And like, there's other people on my team who would like, would like know how to do these kinds of, like, I know how to use instruments, for example, on Mac OS. And that's like its own set of things. You know, I think for us, there's like a bunch of different forms of benchmarking. One is like micro benchmarking which is i guess like generally like the kind of thing that you could do with criterion or a similar crate where you're like running typically fairly well isolated code on test inputs and running like thousands and thousands and thousands or millions of iterations of it and trying to detect very small changes that's that is an incredibly useful thing if you can isolate your code in that way so like if i'm trying to figure out for example if like there's two different ways that i could write this function that like parses this very simple string in the simple way and i'm like i want to know which one's the faster way then like i'll do something like that i'll be like i'll try and just isolate like two methods and then use like criterion or something similar we also do micro benchmarking on ci so we use a tool called cod speed which does like continuous benchmarking continuous profiling that again is really good when you have things that and they run it on valgrind so like basically like if you just try and do this kind of benchmarking on like a GitHub Actions machine, it tends to be extremely unreliable. You have to have like a very high error tolerance. Like when we very early on in Ruff, when we started adding continuous benchmarking, we did on GitHub Actions and we were just regularly see like five plus five to 10% fluctuations, even for no op changes, because those machines are very noisy. So CodSpeed kind of solves this problem because they run all your stuff under Valgrind. So they get like a much cleaner, basically a much cleaner snapshot of like what's actually changing in in terms of performance there's some nuance to it but like basically if you have code io can make things like really messy with benchmarking in general but if you have if you can write benchmarks that don't have any io that are very pure like pure cpu, you can fit them into that kind of micro benchmarking framework it's like incredibly useful especially in ruff where we have a lot of that we run i mean we just run that in ci it catches things all the time real positive changes show up in you know get flagged as positive changes like it's it's it's a fairly high signal um so that's like if you can do that that's great, We can't always do that in UV. We often do more like, I just sort of make these words up, like macro benchmarking. I don't know if that's a word, but that's like, I use Hyperfine usually and I will compile to UV release builds and I will run some operation over and over and I will try to see if I can detect a difference. And I'll typically try and minimize IO as much as I can, but maybe it has to read from disk for the on-disk cache. And like there, it's like, you're trying to, you know, you're trying to find much smaller. It either needs to be really obvious or very consistent in order for you to catch anything this is like one form of benchmarking is oh it made everything 10 faster then it's like very obvious that it made things faster and you can catch it in that kind of benchmark the other. Pieces that you can look at which aren't so much are a little bit harder to do direct comparisons but we will do a fair amount of like there's a great tool called sampling which we use for it's a sample sampling based profiler so like you can just run a uv command you basically you just prefix it with sampling and then it opens up a flame graph in your browser that has you know all the like all the flame graphs of everything that got called and where you spent time and so often that's often a way that it's a little bit harder you can't always tell just from a flame graph if you made your whole program faster right you can tell if you you ran the flame graph you found something that was taking up a lot of time and then you make changes and then it's gone like that's that's good that means like you got rid of that time but like that alone is not enough to tell that your program got faster right that's just that you removed that thing that you were spending time on but maybe it went somewhere else like blah blah so we'll often do that as a way to diagnose issues and if something's slow then we'll be like let's run it under sampling and like see where we're spending time for example and so that's there's like certain tools you want to to diagnose issues and then certain tools that we often use to try and like confirm findings or understand if there are regressions or anything like that yeah and.
Matthias
00:35:24
In my mind when you said sampling profiling and flame graphs i was sort of hoping that the tool would kind of update the chart as the program runs i'm not sure if that's the behavior of that program.
Charlie
00:35:35
I don't think sampling does that i think there might be a way to like compare to flame graphs and like cod speed tries to do this they'll try to diff the profiles it works okay sometimes it's hard for them to i mean it's like inherently that seems like a very hard problem like they have to try and align like what two function calls are like the same like it's not necessarily always trivial because code is changing but like there's some stuff like that.
Matthias
00:36:01
Just for context, maybe someone might not know, a sampling profiler. How would you describe that in a sentence or two?
Charlie
00:36:10
I mean, my understanding is it basically watches your program execute and it samples at some rate. So it takes a bunch of samples as your program is running to try and get a representation of where you're spending time. So maybe it takes, just for stupid numbers, maybe it takes a thousand samples. And like in or let's say 100 samples and in 10 of those you're in this function then you know it would say hey i found this we were running this function 10 of the time that i sampled and here is where it was being called from and like here's how you're spending time so it basically watches your program execute and tries to figure out based on probabilistically by sampling the behavior where you're spending your time that's my understanding i never built one of these so it actually might work totally differently than that but like but like that's my understanding that's.
Matthias
00:36:55
Also my understanding.
Charlie
00:36:56
Which is a little different than like in Valgrind for example like Rust, I know Rust C I think does this too like you can do profiling based on instruction counting so like you actually look, like the instructions that are generated or executed or whatever. And you, you try and do something that's, you don't look at the behavior of the program. You don't like run the program. You try to look at like, or sorry, you run the program, but you look at a different thing. You're not, you're not focused on time, like wall time. You're looking at like what instructions are being executed, how many instructions are being executed. So it gives you a little bit more of like a quantitative look i guess at like where you might be spending time and i believe rust c does has continuous benchmarks that look at i actually i could be totally wrong there's something that i've been shown before in the rust ecosystem that does continuous benchmarking based on instructions instruction counting i.
Matthias
00:37:50
I was kind of thinking that cargo flame graph would do that in comparison to sampling i never used sampling before but my go-to tool for flame graphs is always cargo flame graph i'm not sure if you've used that one.
Charlie
00:38:01
I think i have but it's been a while yeah and you find simply.
Matthias
00:38:05
To be more ergonomic.
Charlie
00:38:07
I found it to be easier yeah yeah it's just like what we tend to use on the team but like you know i'm sure that there are lots of good options but it's just like it just works seamlessly across mac os and linux which is really nice so like a lot of these tools will be like it's maybe hard to get them to work on one or the other for example not always i don't know if that's true of cargo flame graph i just mean like of things we've used in the past sometimes it's like oh this works great on linux but not really on mac os and i think samply tends to work really well on both cargo.
Matthias
00:38:35
Flame graph gives you an svg file that you can open in your browser and it's sort of.
Charlie
00:38:39
Interactive yeah yeah yeah pretty basic as well.
Matthias
00:38:42
I don't know if there's probably support for the chrome profiler which also supports flame graphs but i'm not sure.
Charlie
00:38:50
Yeah yeah the thing so the thing that samply gives you in the browser you can like click into the flame graph you can filter by name you can filter traces by name and stuff it's like it's pretty nice but but again like some things are just really hard to understand in a flame graph like you know if you have a lot of like, like UV is very async and like we have lots, you know, we have like different stuff going on. And so sometimes it's hard to tell like where time is actually being spent. Like it's really things that are spending lots of CPU time are always nice because they're obvious and you can find them and fix them. But things where it's like, oh, the scheduling is like slightly off or like we're waiting here, but like blah, blah, blah. Like those are tend to be harder, like more pernicious bugs to find. by.
Matthias
00:39:31
That definition isn't it true that profiling or benchmarking ruff is easier than uv because ruff is inherently cpu bound you do a lot of computations i'm assuming.
Charlie
00:39:43
Yeah i think it's i think it's much easier yeah like ruff does have a fair amount of uh it does have io right in the sense that we're reading files from disk to analyze them but that io is i i would say like it's a lot more stable and more minimal than in uv and also it's kind of like sort of like happens up front like we read the files and then we analyze them whereas in uv there's kind of just like constantly io happening whether it's we're reading stuff from disk from the user's project or reading stuff from our on disk cache or we're making small http requests or large http requests we're downloading and unzipping files like there's just constantly io happening so it tends to be a bit harder to benchmark often yeah like often we'll bench you know often we'll benchmark by we'll actually run uv with the cache because at least then it's a lot more stable it's like we're doing the work but we're like reading from the cache every time as opposed to if you want to benchmark anything that requires like network io really hard it's very hard because the amount of variants that you'll get is you will usually dwarf like anything that you would see in cpu code changes that.
Matthias
00:41:00
Is true but at the same time you have to be careful not to change your targets you need to be sure.
Charlie
00:41:06
That you're benchmarking.
Matthias
00:41:07
The right thing.
Charlie
00:41:08
Yes absolutely yes and yeah we do make a different distinction between that we think of it as like warm performance versus cold performance it's like performance when you have stuff in the cache versus when you don't and we do look at both There's some things you can do, like you can set up a network link conditioner. That's what it's called on macOS at least. So you can intentionally throttle your own network connection to try and get it to be a bit more consistent, like bring it down to something that would hopefully be a bit more consistent. But again, that's also different because, well, it measures something, but it measures something slightly different than what you would get on your machine typically. Like I have a very high speed internet connection. So like, you know, the bottlenecks that I experience are different when I throttle versus when I don't, because when I throttle, the network is slower. And so if we need to do things at the same time, it's easier. But when my network connection is really fast, like actually operations on the CPU can actually become or like local sorry on disk io can become blocking because like maybe i'm like streaming it faster really fast and trying to unzip a file and write it to disk like the bottlenecks you're just a little bit different the other thing we'll do sometimes is like if we do need to look at anything related to like the http stack we'll just like run a local server so we're still again it's different but it's gives you some information some particular ability yeah yeah yeah it's like if you're trying to like at least because at least then you get consistency in you know to i mean to to a greater degree at least around like what your what your measurements look like yeah the really hard thing with network io is just like it can just be all over the place and if you're trying to measure like a one to five percent performance change it's very hard to do it in the presence of making lots of network requests and.
Matthias
00:42:54
In that context i was surprised to learn that you're using a single-threaded tokio runtime. Isn't that what you're supposed to do when you want super amazing high-performance I.O. in Rust? Do you always use multiple threads in tokio? Like, why did you decide against that?
Charlie
00:43:14
Yeah, I mean, I guess like the... Sorry, the simplest answer is just that, we benchmarked it and we looked at it a lot. And then we found out that like, basically we use a single tokio thread for IO is sort of the way that I would put it. And we like, we found at least the theory, I guess, is that it reduces synchronization costs and that we don't perform like enough IO for multiple threads to be worth it. So like we're often, you know, what's like a lot of IO, right? If you think about a web server that's trying to do or something that's like super high throughput and they're trying to do like thousands of requests per second that's like pretty different than what we're doing because like for us maybe we're downloading 20 packages at the same time right that's 20 different requests that are happening it's like pretty different it's a pretty different amount of throughput so like, we found that using a single runtime for a sorry a single thread for io and then being really careful, about compute work so trying to make sure that we run compute work like off that main thread or off the main thread is also really important so for example like we have a solver in uv we have to solve for dependencies right like we get a big graph of the things that people depend on and then we look at the things that those packages depend on and we have to solve this like big. Constraint satisfaction problem that solver runs on its own thread so like we we move that compute off to like a different thread so we have to be like a little bit careful about like what we do on what on which thread and like how we orchestrate it but we did we did. Like actually find in practice that it was faster to just use like the single threaded runtime there's some nice other things to it too like we can use like rc instead of arc in like some places you know there's like some minor like quality of life things like that but yeah we just make like a lot of small network requests and we found that like using the single threaded runtime empirically was like faster for our program there was also a long conversation when we started uv about whether we should use async rust at all which is sort of another another topic like the thinking there was like do we actually have enough io to demand async rust and or some kind of multi-threaded runtime because like that's maybe what we would see as like the main benefit is like okay we get all this like thread coordination and stuff and like i was pretty into it at the time like other people on the team sort of tried to talk me out of it and were more like okay why don't we just like manage our threads for doing this kind of io i felt like i i felt like doing it with asyncrust was like would be more like doing it right and maybe moving slightly more in the direction of like the arc of the what the ecosystem wants i don't actually know if that's proven to be true like i'm sort of like i think we could have built uv without using asyncrust and it would have been find maybe i'll just put it that way at the same time i actually find working with async rust to be. And i think it's actually improved fairly dramatically even since we started the project like since we started the project like the rust team just like keeps shipping you know like there's like things just keep getting like async that now is like async closures async async async in traits didn't exist when we started the project we had to use like the async trait macro for example like there's just a lot of things in async rust that have actually been like stabilized and improved in the last year and i i really don't find myself having to work around async or fight async quote unquote very much for whatever reason i'm not saying that it's worth it for every project but like i actually don't think it has been a big challenge for us i think the only challenge i think the main challenge has been this stuff around scheduling and like trying to understand like how do we schedule really efficiently and maybe feeling like we have slightly less control than if we were hand-rolling our own approach to threading.
Matthias
00:47:11
I mean, in this situation, there's sort of a middle ground as well, which is instead of having a global tokio runtime, let's say by annotating your main function with tokio main, you could also use structure currency where the core of it is sync. And when you use a lot of IO, when you need a lot of IO, you can start your own little runtime even within a function say or within a struct did you consider that approach as well.
Charlie
00:47:46
Yeah i think that's probably roughly what we would have done if we didn't do this and again i think that could have been totally fine but this also worked and so it's like i don't know it's been good i i think i there are challenges with async right like the things that i mentioned before are becoming less of a pain, but there's still a lot of a pain, like async closures, right? Things have to be sent in sync. It's sort of, it's slightly infectious, right? Whereby like if one thing, if we need to call one async function from another function, suddenly like the async propagates upward. And there were some challenges like that. Like we, just as a random example, like our Git implementation, I originally basically vendored from Cargo. Not exactly, but like I looked at Cargo's Git implementation and I was like, okay, Cargo is good at Git. How do we deal with Git? and i like i sort of like started with what they had and then we we changed it pretty significantly over time but like that was an async right and and so like and we were like calling into it from an async you know from async runtime that ended up actually causing like kind of a lot of problems that were like pretty annoying to debug because for example they were using they need to make network requests and it was using i think it was using using curl or something and i was like okay but we use reqwest. I only want to use one networking stack. So let's replace it with reqwest. But okay, if we want to replace it with reqwest, and it's going to be sync, then we have to use reqwest blocking. And then it was actually for a period of time, I can't remember how we fixed this, it was actually impossible, because then reqwest blocking actually starts, I believe it uses async internally. And so it was like we were starting a tokio runtime within a tokio runtime. And like that would just error, right? And so, so it was like, we were trying to take some code that wasn't async and use it in an async context. And it was just kind of the kind of like buttheads a little bit so you do run into stuff like that by buying into async but but in general i think the ergonomics of it are actually quite good and like i i don't think we pay much of a high cost for using it and hopefully we'll get more and more out of it over time is sort of is a little bit of how i think about it yeah.
Matthias
00:49:58
Exactly so i would assume that the majority of the problems with async cross that you mentioned they don't really, affect you because you don't think libraries first your user facing interface is the cli but if you were.
Charlie
00:50:19
To build.
Matthias
00:50:20
A library and it used tokio as a sort of public interface so it was async to begin with then that can cause some headaches with integration.
Charlie
00:50:34
Because 100% Yeah, sorry. This is actually a great point. That's actually maybe the thing that's most painful about using async is like there are all these crates that we depend on that we have to depend on basically like the async versions of those crates or the async interfaces. And if I look at all those crates, they have to actually maintain like a tokio interface and a sync interface and maybe like an async standard interface like sometimes. And for example, like we have... Tar rs okay really popular common crate for creating and untarring creating tar balls and untarring them and like we we needed like we wanted like an async tar implementation so like it turns out there's something called i'm going to get the exact names of these things wrong because they're all so similar but there's something called async tar but that which is an async. Standard port of tar rs it's meant to be they took tar rs and made it async with async standard well we can't really use that because we're using tokio and not async standard and we don't want to have these two huge dependencies on different async runtimes okay so that it turns out that got forked to something called tokio tar so it went from tar rs to async tar to tokio tar and then that crate actually got forked like two more times um just by different people because it wasn't really maintained and then eventually we forked it ourselves to fix some bugs and now we like fully maintain that like we just like maintain it it's a that's actually a public crate that we published outside of uv where we've like we upstreamed or we downstreamed i guess a lot of things from tar rs and we like fixed some bad bugs that users were hitting and so the whole like the ecosystem problem i actually have no idea how to solve that which is like it seems like a huge pain for crates to have to maintain all these different interfaces and then for us it's like we have two different zip crates we we use zip rs and then we also use async zip because we have like slightly different contexts in which they run and like that's maybe like an us problem but like it it is that is the i think probably the most challenging piece is just the touch points with the rest of the ecosystem when you want to pull in async versions of things or people have to maintain async versions of things like like okay we're going to maintain like async. This async tar crate forever because like the tar rest crate is sync like that's like a that's kind of a bad outcome but i don't have no idea how to solve it i don't know if that's what you're getting at you're probably talking about it more from the library maintaining perspective of having to expose tokio interfaces like a lot of crates will have a tokio feature right that like pulls that stuff in but yeah it's like it's a little painful i.
Matthias
00:53:16
Don't know what's the way out of it here because in reality well the ecosystem is still evolving async std is sort of that so that's off the table but i do believe that there's merit in having a sync interface and then an async interface on top of that which is a separate crate and the as the crate ecosystem allows for it but yeah.
Charlie
00:53:39
Yeah yeah yeah i actually did once maybe just to illustrate how confusing this stuff is and how little I know about it. Like early on, I did actually pull in a crate that used async as standard. And I wasn't really thinking about it very hard. And I just saw I had like an async interface and I needed to call like .compat on like a couple of things or something. And I was like, okay, cool, like this works. And then like, obviously like massively bloated our dependency graph and everything. And people on my team were like, did you do this intentionally? And I was like, I don't even really know what the difference, like at that point in time, I was like, I don't really know like what the difference is between these things. They're just async, right? Blah, blah, blah. So like, it's super confusing. It is helpful that things have centralized more on tokio, I think. But it is it's a very easy like mistake to make and it's like not all clear how these things relate to one another honestly.
Matthias
00:54:29
Yeah yeah it had a ripple effect on the ecosystem which we still deal with today but it's a step into the right direction that the futures trade is now in the rust prelude so it feels like as you say things are progressing Right. Now, another thing that I wanted to touch on, which also kind of is interesting because you diverge from the norm a bit, is parser generators.
Charlie
00:54:59
Oh, yeah.
Matthias
00:55:00
I think you decided to switch to a handwritten parser in ruff. And I wonder, first off, what was the decision process there? And second, how do you handle the complexity that comes with it?
Charlie
00:55:15
Yeah, great question. So like, originally, the parser in ruff came from a project called Rust Python, which is, I guess, in some ways, like an even more ambitious project, because it's a whole Python interpreter. So they're actually trying to build like in Rust, they're trying to build a run, you know, a whole runtime. So like, as part of that, though, it has to parse code. And so we took that I took that parser, and we depended on it as a library. And that parser was based on a parser generator called Lollerpop or L-A-L-R-P-O-P. Again, I don't really know how to pronounce anything because I spend all my time just on the internet. Same. But it's basically like you create a Lollerpop file. It's a DSL, but it also can include Rust code verbatim at different points. So you have something that kind of looks like the grammar. But the thing that we were finding, when we started using that, the first, I guess, sign of trouble was that Rust Python didn't support several syntax features in python new feature newish newish features couple last couple years like match statements python has support for pattern matching and that was added in 3.10 i think and it didn't support it and i was like okay and it turns out that that's because in that in python 3. Oh my god i'm gonna mess this up i guess i think in 3.9 maybe they switched their parser so they moved from an lr1 this is not important if you know what these things are but they moved from an LR1 parser to a PEG parser, a PEG parser. And basically it meant that the grammar got more flexible. So they were able to have things that would be called soft keywords. So for example, match is a valid variable name. You could do match equals one, but it's also a keyword because you can do match object, colon, and then patterns, right? So it's both a valid variable name and a keyword. And whether it's a keyword depends on the context around it. So it depends, like the parser has to be able to support both those use cases. I'm just thinking, I think async, now I can't remember. Async might be a soft keyword as well, but largely it's for things that were added where they don't want to actually, they don't necessarily want to make changes that are backwards incompatible in the sense that there's a lot of existing Python code that needs to run on new versions of Python that might use match as a variable. And so that code should continue to work. So the grammar got more complicated and they added a more powerful parser to support that. Now our parser that we were using based on lollipop couldn't support that there were like, ambiguities in the grammar that were very hard to represent in lollipop and we had to get increasingly good at lollipop in order to do it so like in terms of like the precedence of the statements the way that you do like like basically we were just like learning this tool really deeply and having to put more and more work into actually supporting the Python grammar. So what it felt like at the time was there were a few properties that we wanted, that we thought we could get out of a new parser. One was we thought we could make it a lot faster to start just out of the box. Two, we thought it would be much easier for us to optimize further because it's very hard to optimize like a parser generator. Like the code is generated, right? As it sounds. And so like, you can't like, I mean, there's only so much you can do to like optimize the generated code, right? It's like kind of out of your control. And the third was error recovery. We wanted like much better control over error recovery so that, especially because we're building tooling that's designed to work in an editor. So like if you type a syntax error, like if you type. Def space and you're like starting to type a function we still want to be able to parse as much of the file as we can even though it's not syntactically valid anymore and that's that's like pretty hard like there are probably parser generators that support that to different degrees but like a lot of that is requires fairly bespoke handling of like what happens when you hit an error and like what the different fallback cases could be so there were things we wanted out of a parser generator which were like performance and error recovery and then we also found that we were spending a lot of time just trying to get the part the parser generous was a save you time but ultimately we were spending a lot of time trying to get it to work for the grammar in the first place so we were like we want to rewrite the parser we were like pretty sure about that, and actually a contributor came to us and said that they were working on they wanted to do write like a handwritten parser if i recall correctly i think it was actually for like a master's project. And they were like would you ever like use it in ruff if I can get it to work with the ruff AST like I'm building around the ruff AST sorry they were building around the ruff AST and they were like if I can get it to pass the ruff test suite would you want to use it in ruff and we were like yeah absolutely and we'll pay you for it and so we like, there was a fair amount of so this contributor like brought us this parser and there's a fair amount of kind of like last mile work of making sure that. You know, because we're being used by all these big projects, we need to make sure shipping a parser is like a huge change. So we did end up investing a lot of time in like the final 10% of like making sure that it works in all cases and that it won't panic for people and that it will work exactly as we expect. It's the kind of thing that it's pretty easy to test at very large scale because we just, we can compare our basic, for example, the diagnostics that we get on very large projects. There are arbitrary large projects and we can run ruff before and after on those projects and make sure the diagnostics are completely unchanged for example so we did a lot of large-scale testing and it was actually like an incredibly smooth release like we didn't i don't think we got like we got like maybe one bug report which is amazing i didn't even work on this so i can brag about it a lot but like it's amazing that that happened and ultimately it means we now have our own parser it's totally it's completely handwritten and it's been way easier for us to modify it over time because like it it is work for sure like the grammar gets extended but it we have complete control over like how the parser works and like the idea of adding new syntax to that grammar is so much less daunting than adding it to like the parser generator which isn't it's really not meant to be a knock on the parser generator like i think parse i think like that's a great project i think parser generators can definitely be great it just depends on what you're doing but for us like python there's just a lot in the grammar there are lots of ambiguities and it's evolving and it's something that we have to change so we felt more comfortable doing it ourselves and it was also like way faster like it sped up all of ruff by like 20 to 40 percent or something that's.
Matthias
01:02:04
Pretty impressive and it sort of paid off for you to take ownership of this entire parsing part because it's such an integral part of what you're building at astral in general you can probably use that for uv and for ruff i'm assuming i don't know.
Charlie
01:02:22
We don't use it in uv today the python parser we don't use it in uv today but we could we have other parsers in uv thankfully like the version parser the version specifier parser the requirements txt parser yeah a lot of the work that you do is.
Matthias
01:02:37
Parsing right i.
Charlie
01:02:39
Guess so yeah i mean often like we need to implement standards things that have been specified in python but only have really have python implementations so like versions would be a good example like sounds like a simple thing like 1.0.0 right but like they get actually fairly complicated and so like it's not like that parser is incredibly complicated but, It does run, I don't know, it probably, we're probably analyzing, I have no idea what the number would be. We're probably analyzing billions and billions of versions a day, right? Think about how many times that version parser is parsing versions, right? Like, so, you know, we think a lot about how do we make that fast and how do we also, how do we make sure that it's fully standard compliant? So yeah, we do build a lot of parsers.
Matthias
01:03:24
To me that was probably one of the highlights of your rust last year where you shared that story about version parsing i had such a good laugh so if someone hasn't seen it yet we will link it in the show notes it's that's a really fun example really great yeah it was such a fun exercise and i think you could make an entire course around that but anyhow probably a phd but yeah that's another topic still it's different when you build that as a fun hobby side project or when you do that as a business with multiple employees with a larger code base with multiple crates and so on so i would just wanted to take the opportunity to talk about day two with rust as it stands today what's the verdict i.
Charlie
01:04:12
Love it and i think it was i think it's been such a good choice for our projects so we get to we can build extremely stable extremely fast software, that also has the benefit of being memory safe for like you know thousands and thousands or millions whatever of like python users and the day-to-day is excellent like rust i've always thought that rust kind of like, I don't know. It's not that much of a secret, but like the secret behind Rust is I've always thought is like the tooling. Like I, for me at least, and I've never really written any C++ and people can also just think I'm a moron, which is fine. But like, whenever I look at a C++ project, it's like, it's super intimidating to figure out how do I even get this thing to build or run or like, what am I doing? And I don't know that I ever would have become a quote unquote systems programmer through c++ or i think it would have been a lot harder for me that's my prediction maybe i'm wrong i don't know but like rust it's just such a high confidence experience it's like. You install this thing with rust up you run cargo run cargo build and like that's how you build a project like i can clone any rust project and like feel fairly confident that i know how to run it and how to test it and how to build it and how to understand it and so for me like the tooling story is excellent i i think the only things i really have complaints about and these are just like things that people are always going to that i'm always going to complain about no matter what is like obviously i'd like compile times to be faster that would be nice because as you build a bigger and bigger project it becomes more and more of an issue and it just it's just like a tax on development but you know even if they were faster i'd probably be saying the same thing and saying i just want them to be faster like um that's that's like the main thing for me i think the thing one thing i'm really impressed by in. Rust too is just the rate of the rate at which the language and the tooling keeps improving like every rust release has something i want and something i've been waiting for which is kind of crazy like just even looking at and i'm not even i'm not one of the like i talked to there are people on our team who have been doing rust since before 1.0 right and i've been doing rust for a long time and like i started doing rust in like 2022 is when i started writing rust i think wow yeah i've like i haven't been writing rust for that long but like now i'm like every rust release like there's just like there's just so much progress and it's like awesome to feel like you're part of this community that's like building and growing all the time so it's both it feels so stable and like so mature but it's also the rate of change i think is like great and the rate of progress and the rate at which things are being addressed my only complaint is compile times but i you know i i'll take what i can get really same.
Matthias
01:06:57
Let's let's make it faster definitely.
Charlie
01:06:59
Yeah we.
Matthias
01:07:00
Need more crates in the workspaces but when you when you refactor across, crates or maybe even within crates how does it feel to you like what's that experience like like do you refactor with confidence is it something that you look forward to is it a choice refactor is it more of a you know dread.
Charlie
01:07:18
No i like refactoring and rest a lot i think it's not like, Maybe it's the nature of what we're building. It's, you know, like, I guess the thing that people often say about like very well typed languages or even like functional languages is like, if it compiles, you know, it, you know, it will work. I don't know if I believe that about Rust. There's still ways that your pro code can be wrong, but I do feel like I'm constantly guided by the compiler. And actually more and more, I think the way I write code is that I try and make it such that I will be guided by the compiler in the future. So like for example i think about this with like okay let's say i'm passing like a struct into a function and like i need to do something with every field i want to make sure that if i add a new field to that to that struct i'm reminded that i need to handle that field in that function and so the thing i will often destructure it because that makes sure that if i add it as opposed to referencing all the fields in line like object dot a object dot b i will destructure it at the top of the function it's a very small thing but it means that the compiler will tell you will remind you that you need to look at it if you add a field so like more and more i find myself trying to find ways to be guided by the compiler because it's such a powerful thing, and it just makes it it just makes progress i just think it makes like working on large complex products like so much easier the.
Matthias
01:08:43
Destructuring part i never heard in such detail because yeah.
Charlie
01:08:48
That's kind of a silly little thing but do you know you don't understand what i'm saying right no.
Matthias
01:08:52
I totally like it i will totally steal that idea i like i.
Charlie
01:08:55
Mean hopefully you don't have structs that are that complicated that need that that much but like you know it is i just like more and more try to think about like how will the compiler make sure that i do this correctly in the future yeah.
Matthias
01:09:05
I know about that tip in a serialization context when for example you you want to ensure that all the fields get serialized and deserialized properly then i.
Charlie
01:09:15
Know some.
Matthias
01:09:16
People to structure that but i it never crossed my mind that you can do that in just you know normal function code you could even destructure it right there in your arguments you don't.
Charlie
01:09:26
Even have.
Matthias
01:09:26
To do it in the first line of the body you can do it in the.
Charlie
01:09:29
Arguments because it's just a you.
Matthias
01:09:30
Know pattern match essentially or like it's a destructuring pattern yes.
Charlie
01:09:36
Yeah do.
Matthias
01:09:38
You have more such tips like where can people learn more about ideomatic rust and best practices where did you learn it.
Charlie
01:09:45
I mean a lot of it i learned from having great teammates which is not that's sort of a bad answer because like not ever it just depends on your life situation like you don't have that much control over that you have some control but like but it is a real thing which is i started working on ruff on my own and then as we grew the team i ended up hiring thankfully people who knew a lot more about rust than me and like, Like Mika, our team, who was the second employee to join the company, he just taught me so much about Rust. And then later we hired Andrew Gallant, BurntSushi, who I will often just send him random Rust questions rather than Googling them. I mean, not in a way that is exploitative of that relationship or overly burdensome on me, but he loves being the elder statesman at the company that can help people with hard Rust questions or problems. And so finding great people to learn from is maybe the slightly higher level lesson, but like I know that's not always easy. The other thing that I did is I read a lot of code like. ChatGPT and LLMs are great, but like you should also remember that like GitHub CodeSearch exists and like all these amazing code bases are open source and so for example, something I'll often do now is like if I look at a crate and I want to use it and it doesn't have, maybe it has great examples. Okay, great. if it doesn't have great examples and I don't feel the need to, I don't want to read all the documentation myself, I will actually just go into GitHub code search and I will just search for the struct name or the function name. And I will go find real examples in one second of real projects using that crate. And so like, you can just go read code, you know, like, like reading code is like, you know, all this stuff is accessible to some degree. And so that's, I don't know, that's how i've tried to pick up things and and at least learn like i looked at cargo a lot when we were building uv and tried to understand like how do they do certain things like i don't know how to implement git like let me go look at what cargo does and then let me let me read about their design decisions because it's all documented in their prs and you can understand the trade-offs and you can understand like why they did things a certain way so you know you can also go hunt people down and talk to them about this stuff but there's plenty that you can find without doing that.
Matthias
01:12:01
Fully agree before there were llms there was BurntSushi but unfortunately you hired him so there's just one of one of them but you can still read their open source code so ripgrep whenever someone tells me what is an idiomatic rust crate that i should read i always point them to ripgrep because.
Charlie
01:12:18
Oh yeah i liked that a lot too when we were also when we were figuring out how to structure our crates and how to manage workspaces and our release pipeline and all that stuff there's just so much good there's just so much good code out there and so you know go read it.
Matthias
01:12:30
Yeah I always cry when I open the ripgrep code out of joy of course I. Really like reading this it's amazing. Yeah, unfortunately, we have to come to an end. But I wonder if you have any final statement to the Rust community.
Charlie
01:12:47
Yeah, I mean, I think for me, like I, how do I, how do I say this correctly? Like I, it's kind of amazing, I think that I only started writing Rust like a few years ago. And now we've shipped, I mean, along with a great team, like we've shipped two of these tools that are having, I think at least a huge impact on Python, which is like the most popular or the second most popular programming ecosystem on earth so if you think about it like rust is kind of in a lot of ways rust is kind of like powering python at least to you know if i have a say about it at least rust is powering is powering python and so i don't know i've always just felt i again i never considered myself to be like a systems programmer quote unquote in most of my career i was writing typescript python i mean i did some java professionally but like i had never except for like a course in college done any C. I really hadn't done any C++. And like in the span of a few years, I like learned to build this kind of software. So I don't know. I've had just like great experiences with the community and being welcomed into it and learning the language. And I think that should continue to be a very important part of Rust is like welcoming people in and helping them learn. Because the impact that we can have by building this kind of stuff is just huge, even outside of Rust.
Matthias
01:14:06
Perfect i couldn't have said it better both languages are really close to my heart and i really like to see that synergy jolly your presence was much appreciated i thank you so much for taking the time today yeah.
Charlie
01:14:20
Thank you so much for having me and for for all the great questions i it was it was uh it was really fun.
Matthias
01:14:24
Rust in Production is a podcast by corrode it is hosted by me Matthias Endler and produced by Simon Brüggen for show notes transcripts and to To learn more about how we can help your company make the most of Rust, visit corrode.dev. Thanks for listening to Rust in Production.