Rust in Production

Matthias Endler

NLnet Labs with Arya Khanna and Martin Hoffmann

About securing the internet with Rust

2026-05-07 81 min

Description & Show Notes

Every time you load a website, send an email, or update an app, you're quietly relying on a handful of unglamorous services that route your packets to the right place: DNS to translate names into addresses, and BGP to figure out how to actually get there. When these systems break, or get attacked, the Internet doesn't just slow down but stops working.

For more than 25 years, NLnet Labs has been one of the small, non-profit teams keeping that core infrastructure running. Their software, including the DNS servers NSD and Unbound, the RPKI tools Krill and Routinator, and the new DNSSEC signer Cascade, is deployed everywhere from hobbyist Pi-Hole setups to Let's Encrypt and major Internet operators. And increasingly, it's written in Rust!

In this episode, I talk to Arya Khanna and Martin Hoffmann from NLnet Labs about what it takes to maintain critical Internet infrastructure as a small team, why they bet on Rust for new projects like the domain crate and Cascade and what the rest of us can learn from a codebase whose users include the people who keep your routes flowing.

About NLnet Labs

NLnet Labs is a non-profit foundation based in Amsterdam that develops open source software and open standards for the core infrastructure of the Internet. Since 1999, the small but dedicated team has built some of the most widely deployed building blocks of the modern web, including the authoritative DNS nameserver NSD, the recursive DNS resolver Unbound, and the RPKI tools Krill and Routinator, which secure global Internet routing. Their work is trusted by operators ranging from hobbyist Pi-Hole users to Let's Encrypt and major Internet service providers. In recent years, NLnet Labs has been steadily moving its new development to Rust, with projects like the domain crate and the Cascade DNSSEC signer leading the way.

Links From The Episode

  • NSD - NLNet Labs' first project
  • lychee - A link-checker that receives funding from NLNet (not NLNet labs!)
  • unbound - A DNS server like BIND, but only for recursive queries
  • Cascade - The new DNSSEC signing solution from NLNet Labs
  • Pi-Hole - A small usecase for unbound
  • Let's Encrypt - A big user of unbound with scale and security requirements
  • Asahi Linux - Linux on Apple Silicon, mostly with Rust
  • Binder CVE - A CVE in Rust
  • LDNS - A collection of DNS functions, written in C, now in maintenance mode
  • domain - The new collection of DNS functions, written in Rust
  • tokio - The biggest shared dependency across the Rust ecosystem, first announced in 2017
  • Rust in Production: Helsing with Jon Gjengset - You can take generics too far
  • bytes - Tokio's Arc of bytes
  • Arc Welding - The other type of "fixing"
  • Alejandra González' crate dependency analysis - 46% of published crates depend directly on tokio
  • RPKI - Signing and validating IPs and routing information
  • Routinator - A RPKI validator, one of the first Rust applications in production
  • hyper - The ubiquitous HTTP crate
  • Krill - The RPKI Certificate Authority tool with "fun" shutdown code
  • Roto - Tert's scripting language, used by another NLNet Labs project, Rotonda

Official Links

Transcript

It's time for Rust in production. I'm your host, Matthias Endler from corrode. Today, I talk to Arya Khanna and Martin Hoffmann from NLnet Labs about securing the internet with Rust. Arya and Martin, thanks so much for taking the time. Can you introduce yourselves and NLnet Labs?
Arya
00:00:21
Yes yeah hi i'm Arya i'm a software engineer here i've been using rust for i think five or six years at this point it's been it's been really fun i've been using it since i started university and then i jumped into NLnet lab straight from there i've been working here for about a year and a half or two years now it's been really fun.
Martin
00:00:45
I'm martin i'm also software engineer i've been at NLnet labs a bit longer for i think 10 years now and i've been doing rust since 1.0 since 2015 which i think also pretty much coincides with me starting here so i basically brought rust with me here i also indeed started the use of rust in in NLnet labs more or less guerrilla style and then later on the organization decided to embrace it which was a really cool path.
Matthias
00:01:15
There's a funny story to this because we met at FOSDEM and we, I mentioned that Lychee, which is a link checker that I help working on, is sponsored by NLnet.
Arya
00:01:32
Yeah.
Matthias
00:01:32
And then your response, Arya, was that NLnet Labs is different than NLnet, the organization that funds the work. So maybe we can clear up that confusion.
Martin
00:01:44
Yes. So NLnet was the first, I believe, commercial ISP in the Netherlands. And but they weren't they weren't privately owned they weren't owned by by universities and and sort of organizations of that type and eventually they got sold and the money from that sale went into a foundation and that became an lnet and their task was then to sponsor or to finance development for the internet which is what they still do they hand out grants the money money has since run out but they still sort of work on that on that in that field and in 1999 they decided that they wanted to develop a a dns server this came this came out of of right because back then everyone was running bind and some people who were running the dns root zone decided well that's a bit scary and so an lnet labs was set up to to develop a alternative which became nsd and that's sort of the story. Initially, NLnet funded pretty much all of NLnet labs. Some other organizations shipped in, like SIDN, the .NL people. But over time, we're sort of shifting more and more towards a support contract. And a grants-based model, financing model, so that we're not solely relying on a single entity to sponsor us.
Arya
00:03:07
But now we're doing, after NSD, we expanded a bit. We also do Unbound, which is a very popular recursive resolver. We've also expanded into the internet routing field with BGP and RBKI. And I think, Mark, you can expand more because that is not my department. And, of course, we've been working on a bunch of additional DNS products. Most recently, we started working on Cascade, which is a, I guess, a DNSSEC signing solution. And it's replacing our previous project, OpenDNSSEC. And we're building now on top of Rust. And it's been a really fun experience.
Matthias
00:03:49
I guess, in summary, for people that are not that much into networking, you build things that run the internet.
Arya
00:03:56
I think we contribute to some of the critical infrastructure that keeps the internet going, and we're really proud of that.
Matthias
00:04:06
What is Unbound?
Martin
00:04:08
Unbound is a dns recursive resolver which means if you query for like say an ip address you don't go to the to the you don't go to the servers directly you first have to find out where the servers live because dns is like a giant distributed database and basically what unbound does is it figures out figures out where to go and does all the queries and what it also does is then if lots of people do the same queries it has a caching thing so that you don't have to go all the way all over again that's a very complicated thing there are in unbound there like 25 years of experience in corner cases there's a lot of security things in there dns can be a lot so these dns queries can be very efficiently used for amplification attacks for for for ddos attacks so there's a lot of work in there to work to make sure that that doesn't happen or to mitigate people trying to do that.
Arya
00:05:02
Yeah, it gets pretty complicated.
Martin
00:05:04
Yeah, it's a really complicated piece of software.
Arya
00:05:06
But yeah, I guess it's just serving as a worldwide cache, depending on where your servers are, but just serves as a cache to help DNS clients fetch information faster, more efficiently.
Matthias
00:05:19
And did that work on Unbound, which is a C-based project, originate at NLnet Labs, or did you sort of inherit that from somewhere else?
Martin
00:05:30
I'm honestly not sure. I remember there was a prototype in Java way back when. I don't know if that happened here or if that happened somewhere else, but the current C implementation definitely started here. I really should have asked.
Arya
00:05:46
We have had other projects that have originated from a more sort of collaborative point. So OpenDNS Sec is one example, which I think we'll talk about more in a little bit. But yeah, most of these projects are NLnet Labs or traditionals, I guess.
Matthias
00:06:06
Now, you mentioned that in 1999, you started to build your own DNS server, NSD. And that is mission-critical infrastructure that cannot fail. Back in the day, we didn't have Rust. That's why you used a different language, C, to build it with all the baggage that comes along with it. And yet you managed to build a very high quality reliable open source tool there, can you talk about writing c at that level what does it feel like to build, tools that stand the test of time in c um.
Martin
00:06:46
I wasn't part of that so i can't really unfortunately i have done some some development work in c but i don't think i've done that at the level of nsd or unbound. I think a lot of it is experience. There's a lot of testing, a lot of just time. You need to be super careful obviously. I think really a lot of it is just that they have been around and have been battle-tested for a long time.
Arya
00:07:13
It also, I think, really helps for the developers we have here who are just really familiar with the codebase. So they have a really good intuitive understanding of where all the different parts lie And I think as long as if a developer has that entire mental model in their head of what this application is supposed to do, and they've had the time to make sure that their mental model and the actual implementation correspond really well, like the people here have, then it sets you up really well, regardless of what language you're implementing in.
Matthias
00:07:48
Does Unbound still see any feature development, or is it mostly bug fixes by now?
Martin
00:07:55
No, there's quite a lot of features. Well, not quite a lot, but there are still features being added. So DNS, surprisingly, even though it's like, what, 40 years old, there's still a lot of development ongoing. People invent new things, add new things, and those are typically implemented in Unbound when they are being standardized in the ITS. So Unbound, in that sense, also serves as like a platform to try out things.
Arya
00:08:17
I believe one of the features that comes to mind recently was DNS over Quick. But yeah, it's interesting how even though DNS is so old, we're still seeing, it doesn't feel like we've reached any sort of endpoint. We still, NLLet Lab still participates in a lot of DNS conferences, which there are a surprising number of. And it's really interesting to see what features and what ideas people have and where they're trying to take the protocol. It's also really important work because DNS was obviously developed at a time where, Internet security was a very different thing. And now we're at this point where we have so many additional concerns that the protocol was never designed for. And we're continuously working towards resolving those. And you can't resolve, a lot of them are to do with the protocol itself and not a particular implementation. So there are plenty of issues that just Unbound or just NSD cannot implement fixes for. We need to talk among different implementers, with the whole community, and look for solutions that everybody can implement together because of DNS's distributed nature, you need that sort of collaboration in order to make this protocol stand the test of time. It's a long and arduous and very active process.
Matthias
00:09:46
Can you name a few organizations who use Unbound in production? Yeah.
Martin
00:09:51
Pretty much everyone. So the interesting thing about Unbound is it works from large ISPs using it for the DNS servers for their customers to people using it on their Pi-Hole as the DNS server for doing the queries locally. So the scale is really quite buffling. It's like from just running on a Raspberry Pi to a cluster of these things in a large ISP. An interesting one might be Let's Encrypt, who need DNS to do the verification for when you request a certificate. So they need to have DNS resolving that is fast, that is reliable, and that is secure. So I think that's a really good example. Other examples are a lot of ISPs use Unbound, and then they don't talk about it. But it is in a lot of places.
Matthias
00:10:43
Now a lot of these organizations used it as a critical layer of their, infrastructure it's foundational work so it would be sort of misguided to think that it has to be rewritten to rust by design just because we can that's certainly not a good use of our time What do you think about that? Would you say it makes sense to rewrite Unbound in Rust or would you say no, we should rather keep it around and maintain it?
Martin
00:11:19
I think, so we've decided to, at least for now, maintain it and keep it. The main motivation is there are 25 years of experience in it. It has been extensively tested, so there have been security audits and all of these things. And obviously, it has been used in production in anger a lot. So I think to get to the same state that Unbound is right now, if you want to start from scratch, that's a lot of work. That's probably like five years of work, I would sort of guess. And we know it's a good piece of software. We know it has very few issues. So I think we can use our time more efficiently for other things than re-implementing and on. We might eventually do it because you never know what's going to happen in the future. But I think for the near and middle future, they're probably not.
Matthias
00:12:10
Martin, you mentioned that you've been using Rust since around 1.0. That was around 2015 what initially drew you towards rust and what was your first project in rust.
Martin
00:12:27
Then a colleague in a previous job suggested rust to me and saying you will like this language and they were right so i just basically i gave it a try and i am someone who learns a language by just implementing stuff and at that time i also started here at another labs and i figured i need to learn and understand dns so my first project indeed was a dns library in rust which is probably not the best idea but it was quite a lot of fun and indeed it turned into our dns library that we're now using internally and also that is available and call it domain that was my first project indeed and it started out as like a private project as like a hobby project on the site but then eventually i talked to two colleagues here and we agreed that it should be adopted by an and that we should build our dns rust things on top of that so that was the very interesting course of events or, path i guess did.
Matthias
00:13:25
You have those conversations pretty early on or was it later in the game it.
Martin
00:13:30
Was quite late it was basically when we started to decide um to do more things in rust so here our first official use in rust i think it was in routing security in like in 2017 2018 i think we started with that and that's also when i sort of had to let it go a bit because i did more work on routing security, so I didn't have the time for domain anymore. And somewhere in that time, I then asked if we can maybe sort of adopt it in company. Also for maybe spend some more time on maintenance than I had at the time.
Matthias
00:14:06
I could imagine that there were a bunch of veterans who didn't want to let go of their prior knowledge in C. Was there a lot of resistance when you started to explore using Rost at NNNAT Labs?
Martin
00:14:21
In company, not so much, I think, but maybe also because we didn't sort of immediately switch over. Unbound is still unbound, and there was also never an intention to make that go away. So it always felt a bit like we're doing new things in Rust, but we're not throwing away everything we have and all the experience we have. So I didn't really feel much resistance.
Matthias
00:14:44
Do you think that was a big selling point for Rust, that you didn't have to throw away everything?
Martin
00:14:50
Not sure because like again like these were completely separate things right we started with routing security which was which was not something we had done before so it was an entirely new track so maybe that was also a smart way to to approach this okay.
Arya
00:15:03
Just try it when there's something new yeah.
Matthias
00:15:06
Yeah, I hear that a lot, and I see it in organizations. They start to adopt trust in areas that are central to an organization, but not critical, not mission critical. And sometimes, if you can find a greenfield project that maybe you want to communicate with through a network boundary, this is an easy way in. It feels like this is a similar story here at NLnet Labs.
Martin
00:15:33
Yeah i think i think it also because like the argument is if you start a new project, if you start a new project especially like a security related project in 2018 or whenever it was doing that in c is just like a little wrong obviously there would have been other choices and we also explored them like we had like a short period another colleague who also started a project at the time thought maybe do this and go and we just did like a yeah just just go and try it. And they very quickly decided that, yeah, let's do Rust.
Matthias
00:16:07
And why was that? Can you remember?
Martin
00:16:09
I think he didn't really like Go. It feels a bit, I don't know, I don't want to say bad things, but like it feels not quite as modern as Rust, I want to say. Like the type system is not nearly as powerful. And maybe also, because I was a bit early, and maybe he also saw what I did and just agreed that maybe this is a better path to go and sort of do things together and not have two different projects in different languages.
Matthias
00:16:38
Do you remember if there was some sort of aha moment where you show people your work and they started to get it, they started to understand that this was a very powerful piece of technology?
Martin
00:16:51
Can't think of any concrete moment now so.
Matthias
00:16:54
It was more of a gradual transition.
Martin
00:16:55
So i think it's also the the rust projects that we started a lot of that was with new with new colleagues who started here and quite a few of them also specifically started because they knew they could do things in rust here which which is very interesting because like normally lower back then the the narrative always was yeah you can't do rust because you can't find people and our experience actually has been the reverse it has that like people are super willing to do rust in production but we have a really hard time to find c programmers now that can can maintain the projects that we have so that's maybe also one of the reasons why eventually we might have to like move away from unbounded c it's just that like we just won't have the people to maintain it anymore when the current sort of generation retires or well that's a long time probably Or moves on. Or moves on, yeah.
Matthias
00:17:44
It is a real problem because C often doesn't get taught in university anymore.
Martin
00:17:50
Yeah, exactly.
Matthias
00:17:51
And people might be afraid to touch it after graduating on their first job because they've heard a few scary things about C and how it can be misused.
Arya
00:18:04
There's plenty of good C horror stories. I still feel relatively optimistic because, so I finished my, I started my bachelor like almost five years ago at the TU Delft here. And the very first quarter, they make you learn assembly with very little help. And based on that experience alone, I think I'm hopeful that we're going to still have a generation of programmers who are willing to touch this, but they are harder and harder to come by.
Matthias
00:18:39
Well that's a positive look at things yeah um but also the the job market in and of itself maybe doesn't really lend itself to see veterans right i don't i don't know where i would look for people with that level of expertise right away it's it's one thing to maybe get in touch with it it's another thing to try and want to work with that language on a daily basis i certainly wouldn't want to do that but yeah that's just me i'm biased here yeah.
Martin
00:19:12
But it's also like i kind of sort of don't want to suggest plus to people because my own experience is that once i tried rust i don't i really don't want to do c ever again so the people who do who do the c development i don't want to suggest to them to destroy rust because we might lose them too let's.
Arya
00:19:28
Just do good yeah but but really we've all had i think rust has been so fun to work with and also i think it's changed the way that I do programming at all. Like I think having now spent more than five years writing Rust, I think I would write C code differently than I used to before, even at the same degree of experience with it. And I find that really interesting. Like it really changes your perspective. And I think that's part of why it feels, I think that's part of why people who switch to Rust have a hard time trying anything else. Once it changes your mental model in that way, you struggle to use other languages because they don't offer the same features to help you express things.
Matthias
00:20:14
You mean expressing in the type system?
Arya
00:20:17
Yeah, I think one of the... So on the one hand, there's aspects like in Rust, you can craft very elegant and very precise APIs. You know, something that you'll often find, in a project that's trying to use the type system well is, for example, zero-sized types that prove that something is true, where you can set bounds so it is only constructed under certain conditions. You can't do that in C. Often you can just create new type wrappers around things to express that, oh, this is a string, but it has certain invariants on top. And there's just a lot more boilerplate and it's harder to get that point across in other languages. So that's definitely one aspect. But I also think things like border checking and the ownership model, which are really fundamental not just to they're less focused on how you actually write interfaces but more on the actual programming you're doing and I think that's one place where, Rust's model is substantially can be substantially different from what you'd expect to do in C or other languages.
Matthias
00:21:27
But what if there was someone on the team who might be a Rust skeptic, maybe someone who says, it's a skill issue if you can't write production-ready Rust card? What would you tell them? How would you convince people to give it a try?
Arya
00:21:48
It's hard to i think one of the best examples one of the points that i've seen recently that really reminded me of how good rust is to us was the experience of the Asahi Linux people so if you don't know Asahi Linux is a project to port Linux to the new the newer macbooks and one of the tasks they undertook was writing a gpu driver which they did in rust and And now to think that somebody is trying to port a, is trying to implement a GPU driver for an undocumented architecture, which you would normally have to do in C at the kernel level, it just, the prospect is horrifying. It sounds like a nightmare to deal with. And their experience writing this in Rust was, I believe they had two or three bugs, specifically two or three. And I find that just incredible. To be in a place where you don't have to worry about memory safety issues, not in the sense that you don't have to worry about them. They are so much, they're so contained that you know specifically where you should think about this and 99% of the time you're not thinking about it. To see, yeah, so features like that and to just have that full experience just reminded me that, yeah, Rust is being really good to us right now. We can now write software much more confidently. Often just, if it compiles, if Clippy's happy, then this code just works. And that's a sense of confidence you don't get with other languages, especially with C.
Matthias
00:23:31
I agree with you, but at the same time, it's a bit like explaining to someone how to ride a bicycle who hasn't seen a bicycle yet.
Arya
00:23:39
That's fair.
Martin
00:23:41
That's the tricky bit. This is an experience you have to make yourself. You write code the entire week, you compile it, but you never run it. And then on Friday, you stick it all together and it actually works. that's such a profound experience especially indeed if you come from sea where then you spend two weeks chasing sex faults that but yeah like you can tell people that that's true but they will never believe you because why would they and yet it's super hard to to just convince someone who's especially if they're not willing like if they think this is just a fad and it will go away and we're all going to go back to sea and in five years again i think that we're we're past that phase now. I think just because time has passed and Rust is still here and more popular than ever. So I think we're past this particular phase that, yeah, it's just another fad and next week we have another language that we're all going to chase after. I think Rust has proven to be here to stay. Maybe that's an argument. I don't know. But if someone who really doesn't want to try it, then I don't think I would even try to convince them.
Matthias
00:24:46
There was news about a segfault being introduced into the Linux kernel in one particular gnarly area of the binder driver, and people made a headline out of it. And on the same day, there were, I think, hundreds of other CVEs that were disclosed in the C part of the Linux kernel. And I always find it fascinating that you can kind of disregard all of that evidence and maybe point out flaws that, yeah, obviously still exist, but at a much lower level in Rust than in other languages. And people say, why would you rewrite a perfectly working system in Rust if the C library was maintained for decades, right? In your case, for example, Unbound is a case where maybe you decided, no, we don't want to rewrite it in Rust, but there might also still be use cases for Rust at this infrastructure level in distributions in Ubuntu, in the Linux kernel, and so on.
Martin
00:25:55
Yeah so so we're also maintaining a library called ldns which is like a c library to do to do dns things and we actually for that one we decided it has reached the end of its life, and we are working on a replacement for that in rust so this there's two parts of this there's the actual c library which we will maintain and allow like we will fix bugs but we will not add new features but also there was a bunch of binaries coming out of this which always were intended as examples but then people took them and like you will find them in all the Linux distributions and people actually use them for production stuff which was never intended and we're basically those we are now replacing with slowly with my tool set built on top of the domain library in in in rust, um so is rust.
Matthias
00:26:44
A default for new projects now.
Martin
00:26:46
Yes, it certainly. Yeah, that's a decision we took in, I don't want to say like two years ago, that basically we were betting the ship on Rust.
Arya
00:26:56
So about two years ago, I think around the end of 2019, start of 2020. No, wait, no.
Martin
00:27:02
That's not two years ago.
Arya
00:27:03
Sorry.
Matthias
00:27:05
It feels like two years ago.
Arya
00:27:06
End of 2023, start of 2024.
Martin
00:27:09
Yeah.
Arya
00:27:11
We worked with the Sovereign Tech Agency with their fund, And they funded a large part of our work on domain. And that was, so we sort of spent all of 2024 working on taking domain from something that wasn't really developed much past that hobby project phase into something that is much, much more production ready, something that you can actually use. And i think about that time we had really settled into rust is here and this is how we're going to be moving forward with things.
Matthias
00:27:45
Well that's fascinating because domain was what martin's first rust side project turned into were there any design decisions that in hindsight you would have changed or let's say anything that you regret on the initial design or did it pretty much evolve very naturally.
Martin
00:28:06
I think well there's always things you would have done differently but also to a large degree because this is so old um this is like 2015 rust and i think there's a lot of stuff in there that we would do different now just because we can do them differently um for example, the whole like bite slice handling i think the the um the borough checker has become a lot smarter so you can do more things that would have been tricky back then? Obviously, I think this even predates us and Grask and tokio and these things. So that was all fun. I remember working with my own state machines. Even in the very initial future implementations, where you basically had to implement your own futures, that stuff was also all still, or is still in there, partially. I think we removed most of it. So I think all the networking bit we rewrote as part of the project that Aya mentioned. And what is currently happening is we actually are rewriting lots of parts of it. So Aya, she did a lot of that. Just because I think, not necessarily, I hope, because the code is horrible, but it's just because it's dated, because Rust has moved on, and you can do things in a better way now. And I completely agree that that's true. And yes, it has been my first Rust project, and you'll see that in points and places.
Arya
00:29:29
When I joined Anilat Labs, which was end of 2024, I joined just as this work on domain was coming to a close. I was looking through the API and I realized that knowing that the project had been around for so long, I realized that there were actually a bunch of interesting language features that had evolved since then, which made it a lot, which allowed us to simplify the API a lot. So at the moment, the current API, which is the same as it's always been, is heavily generic because handling a lot of these, a lot of the data that you need in DNS involves byte slices in some way or the other. And we would always make everything generic so you can put a VEC in there or put a reference to a data that's allocated elsewhere. But those genetics sort of complicate the API a lot and make it really hard to see how to actually use these things sometimes, And so I really enjoy dynamically sized types. There's a bunch of ways in which we can trim down on our API surface, simplify some things, and simultaneously make them more efficient. We have started doing things. We started parsing and serializing data in a more zero copy fashion. So that's also been really helpful as a way to look for nice performance improvements along the way. and that's what a major part of the this not strictly a rewrite but sort of overhauling a lot of the apis about.
Matthias
00:31:05
It's funny that you mentioned that because we recently had john from helsing on the show and he mentioned that you can totally take generics too far and sometimes that's a rite of passage that a lot of people have to go through when they write more rust or they write their first production grade ROST, because not only do we have to think about the library, but also how the library gets used in the application code. Can you maybe look at it from your lens? Is that true for the domain crate? And can you also talk a little bit more about the dynamically sized types that you mentioned? I'm interested in that part.
Arya
00:31:47
Okay. Okay. One of the things that we found in Domain was that we had a lot of layers that were trying to be generic over parameters like this, such as what byte slices you're using. Are you going to use, for example, tokio has, under the tokio umbrella project, there's the bytes crate, which provides this. It's essentially an arc of a slice of bytes. You can copy it around very efficiently. It's heap allocated. You don't need to worry about having a lifetime. And so for us, often the choice was, do you want to use a borrowed slice? Do you want to use a VEC? Or do you want to use bytes? And because of this, we often had a bunch of genetic parameters everywhere. But when we would try to write code that was genetic over that, it was, we would often end up, it was too complicated to work with the genetic parameters themselves. So often we would just copy all of the data out of them into a new allocation where we have a concrete type, like a vec. And we're like, okay, now this is a vec, so our code can work with this. We don't need to worry about what generic bounds are involved and how to process this thing. We don't need to worry about, can this thing be mutated or whatever. We just have a single bound of, it is a byte slice, I can access the bytes. We would copy it out, work with it, and then convert it back into that form. And that ended up, that became a really common pattern in our code. And that's, That's exactly what we were sort of trying to avoid with providing this amount of these generic parameters. We wanted to be able to use those parameters, but just expressing all those bounds became really hard.
Martin
00:33:28
And so, yeah. So it went into higher kind of lifetimes and that sort of thing. Yeah. Yeah. It became really crazy. The reason is that what you basically want to do is you want to take out parts of a byte slice and return that. And if you have a slice, then that's basically just a slice. So it has the same lifetime as a slice if you have a vec you need a slice but if you have if you actually have a vec then you need to fabricate the lifetime for that slice somehow which is done which is the lifetime of the reference to the vec that you're having and if you have a bytes because that is a like it's because it's an arc you can take out like a bit which is then still owned you actually don't have a lifetime at all because you have an own type and expressing this generically especially before guts but what is it generic associated types turned into a very interesting exercise in higher kind of lifetimes which was really quite great so you had lots of four four we.
Arya
00:34:28
Were basically we were just stretching the language beyond what it was capable.
Martin
00:34:32
Of so you often had like three lines of of where clauses for a function. And then because that all sort of it transitively gets further and further. So if you have a function that uses a function and that also has those straight bounds and that indeed just became a little more crazy than it needed to be.
Arya
00:34:50
Yeah. We have, so for example, In the domain library, we have a mechanism for essentially building a DNS server where you can implement certain traits and then that will allow you to receive DNS requests and then decide how to handle them, including passing them down to later layers. And so we use this to, for example, build the DNS servers that we're using in Cascade right now. But these can often involve a lot of generic parameters, and especially because then you'd end up with layers that are nested as generic parameters of other layers, you can get some incredibly long and complicated types. And so we've been looking for ways to avoid those cases and try to make it simpler. Because not only does it cause frustrations for us trying to implement around this code and try to remain genetic across everything, but it's also, I think, complicated for the users when they're looking in their LSB or at a compile editor and they see a ginormous type, which they didn't really expect to see there. And it's just hard for everybody mentally.
Martin
00:36:02
Yeah, the error messages were crazy. Java-like.
Matthias
00:36:06
Well, this is certainly relatable. I've been on both sides of that equation, both as a library maintainer and as a user. And the very reason for introducing those generics was to make the code maybe efficient and maybe catered to the inner type that you want to have a generic over. But at the same time, when you copy out the contents and convert it into a vector, cloning that, like basically creating your allocation, you're kind of working against it a little bit. And ergonomically, it also sounds not the best.
Arya
00:36:44
Yeah.
Matthias
00:36:45
But how do dynamic size types come into the equation here then?
Arya
00:36:49
We actually went through a period where I began introducing some DSDs into the system and then we realized that not everybody has really interacted with DSDs before. And it requires a reasonable amount of sort of explanation. So normally in Rust, if you have something like a U64, That type, if you have like a local variable that's a U64, then that is just a value that you can work with as is, right? You own that value. And if you have something like a slice, right, a reference to a slice, now that is a pointer to data that is located elsewhere. And the important thing is for all of your local variables, they all have to have a fixed size because the compiler needs to know how to move them around. And it gets very complicated if it does not have a fixed size to work with. So U64 is easy because it's always eight bytes, right? But when we're working with pretty much all of the data types we need to deal with for DNS are variable-sized in some way, shape, or form. So like a domain name, right, which is just the most important thing, can be up to 255 bytes in size, or it could be four bytes, right? And you have to decide, well, how am I going to store this? One option is that you always take the biggest possible size and you say, I'm going to define a 255 byte buffer and that is my domain name. And every time I want to think about a domain name, I'm going to use this 255 byte thing. But that is a performance issue because the compiler is also going to have to copy around 255 bytes wherever you're going. And that's no fun. And so you don't want to work with that whole thing. You want something that is variable sized. In domain, what we have right now is a domain name type, just named name, and it is generic over how the bytes underneath it are stored. So it is just a transparent wrapper around whatever bytes you want. If you have bytes that come from a fixed size buffer, name can deal with that. If you have bytes that are stored and referenced from a slice, then name will just wrap those for you. So you can define, for example, a 16-byte buffer, which is holding your domain name, and now the name type from domain can reference that, can hold that. But that requires you to have generic parameters everywhere. The alternative approach is to really think about how that byte slice works, right? So as I said, the variable-sized option is to have a reference to a slice of bytes. But there were two different components there. There's the reference, and then there's the slice. So Rust actually has this first-class concept of a type that does not have a fixed size. So if you just write slice of T, right, square brackets of T, now that is a variable size type. You cannot hold that as a local variable on its own, but you can still interact with it indirectly, right? You can actually have a box of slice of T. You can have a reference to a slice of T. And so Rust actually allows you to define your own types with that property. So in the new API that we're building, our name type does not have a generic parameter. Instead, it is a byte slice, essentially. So you would hold it by reference, or you can put it in a box the way that you can a byte slice. And that sort of takes the generic parameter away, essentially. So if you wanted, in the old API, if you had a name of a box of a slice of bytes. In the new API, you just have a box of a name, right? That box of slice of bytes just became a box of name.
Matthias
00:40:37
So in a sense, rather than making the type generic, you separate that and move the generic out of the type.
Arya
00:40:45
Yes, exactly.
Matthias
00:40:47
And then you have a generic pointer to a type that is somewhere and you need to deal with moving that around, the ownership and so on. And it depends on what you do with it. It's not imposed on the type itself.
Arya
00:40:59
Yeah. And now this has some limitations. For example, we can't use the bytes crate with this system anymore. Because the bytes crate always holds normal byte slices. And now these are technically not normal byte slices. So you would have to come up with some custom specialization of the bytes type that works for this, for example. And thus far, our choice has just been to not add that sort of support. We mostly just work with references and with box allocations.
Matthias
00:41:28
But I'm assuming that you have conversions in place for converting that into a thing that bytes understands.
Arya
00:41:35
Yeah, yeah, yeah. You can convert back and forth.
Martin
00:41:37
And you can get quite far if you just use an arc, like stick it into an arc instead of a box, and then you're very close.
Arya
00:41:44
Yeah exactly arc of name works exactly the same and it has a bunch of the properties that you would normally need.
Matthias
00:41:52
Would you say that arc is an underused type in rust.
Arya
00:41:55
That's.
Martin
00:41:55
Not not by me we.
Arya
00:41:58
Have a lot of arc based code here.
Martin
00:42:00
So i'm trying to establish a term for fixing your lifetime problems by sticking everything in an arc i want to call it arc welding.
Matthias
00:42:07
Oh wow that's a nice one.
Martin
00:42:09
Yes i'm trying to make that a thing but yes i think like a lot of the things that you especially if you're like normally you would just do boxes or strings if you if you work with strings but if you have concurrent code then arcs are extremely handy and especially if you then also use things like arc swap where you where you have the ability to just take a like take a copy of an arc and place an arc in a shared container.
Arya
00:42:33
You can you can atomically swap yeah The Arc data.
Martin
00:42:35
That is, in many cases, if you're things like, I don't know, metrics or stuff, then very often it's very handy, it's very simple to use, and you don't have to worry too much about concurrency. Yeah. It's super handy.
Arya
00:42:49
I actually think Arc is not going to be an underused type because, as an example, everybody's using tokio, right? I think Alejandra González, who's one of the Clippy developers, was running some tests recently and discovered that tokio's by far the biggest shared dependency across the Rust ecosystem. tokio, if you're spawning tokio tasks, right, those tasks cannot reference data from outside their caller. They all have the tick static bound. And so if you're ever, I think for any application that is seriously working with tokio and it's sending data across or sharing data between different tokio tasks, ARK is sort of, the primary solution that people will reach for so yeah i definitely don't think it'd be an underused type within these contexts.
Matthias
00:43:40
And if you allow me to pun this is also our arc back to internet labs because when martin introduced the domain crate or he worked on it asyncross didn't exist so the domain crate must have been sync did that change was that a thing that you later on, change to being async or do you leave the core sync for good reason.
Martin
00:44:06
A lot of the networking code actually only was written in that sta project so in 2024 yeah so the i think what we had was a very simple stop resolver so stop resolver is the thing that sits in your typically in your c library and you just give it a name and it gives you an address back and it does all of the dns shenanigans for you so.
Arya
00:44:31
That's pretty much just it's the default it's the thing that secretly does dns inside almost every application and people don't think about it yeah and yeah domain initially just did that.
Martin
00:44:43
Yeah and that was written in initially in traditional or traditional i think i'm not even sure if that part predates futures or not i'd have to look it up i don't know but it It was initially definitely implemented as handwritten futures, which was quite fun.
Matthias
00:45:02
If anyone hasn't experienced the joy of handwriting futures, I highly recommend it. Did you manage to clean that up later on and completely get rid of that and literally lean into the futures ecosystem that came later?
Martin
00:45:17
I think a lot of the new code is indeed just async functions.
Arya
00:45:21
The only problem is that the code does not use async functions in traits. That is part of the new API work, which I think the networking code is largely incomplete right now.
Martin
00:45:34
Yeah, but it also predates async traits. Yeah.
Matthias
00:45:37
And it would also be a breaking change now, would it?
Martin
00:45:40
That's fine because we declared it experimental.
Arya
00:45:43
Yeah, the entire feature is wrapped under...
Martin
00:45:45
So we did a little trick where we said, we define experimental features. They have feature flags, unstable dash something, and changing things breakingly in there is not a breaking change for the library.
Matthias
00:45:59
To take a quick step back, we're still somewhat around the 2017-18 timeline. What other things did you develop in Rust during that time?
Martin
00:46:11
So in, I think sort of mid-2017, we started looking at routing security, RPKI, which was a not really new i think that started in 2012 or so but it was that was the point where it started to seriously take off and at that point so for up for rpki you need something that's called a validator that is basically a thing that collects all of the rpki information that is out there from various distributed repositories what.
Matthias
00:46:42
Does rpki stand for.
Martin
00:46:43
It stands for resource public key infrastructure and resource here in this is internet resources or ip address prefixes or as numbers and the idea is basically to make signed verifiable statements for these resources, That's also why you need a validator, because someone has to collect all of this information and validate it and produce validated data sets out of it. And at that point, there was an initial prototype implementation in, I want to say, Python that was done during the sanitization of the protocols. And there was a Java project that was built by the RIPE NCC, which is one of the people who issue these resources. And they just made an implementation to sort of further the deployment of this technology. And because this was a Java thing, the memory consumption of it was pretty terrible. So it would regularly fall over running out of four gigabytes of memory. So there was a need for an implementation that is more minimal and can be run on smaller hardware. And that's what became Routinator, which we started in 2017, indeed. And because of the lack of alternatives when we released it in 2018 it went into production and widely used in production fairly quickly that.
Matthias
00:48:11
Means you must have been one of the first rost in production users for real.
Martin
00:48:14
I would assume so yes because it was also like now we have a market share of about 70 or something like that and i think that went fairly quickly there was not too much later also an implementation goal which has since been abandoned and there is now also a C implementation by the OpenBSD people, but Routinator is still the most popular API validators. So the experience of indeed having this very early Rust project in production was quite interesting, because I expected a lot of pushback from people, from saying, you just wrote this in this fancy new language, and that can't be good. But there was none. Some people were a bit skeptical or had problems with packaging it. So there was an interest in packaging it for Debian and these sort of things, which at that point was still very tricky. So there was a sort of a bit of complaints from that side but interestingly nobody really was bothered with having to install rust up and rust and then just compile it from scratch, which we which required at that point we put a lot of effort into making the build system or not putting anything in there that would complicate building so we wouldn't rely on open ssl and these things because that was was and is a pain so like it literally built with gargo install which i think helped but yeah i'm really surprised i was really surprised and which also is part of why we then later found it easy to say yeah let's just do everything in rust and that there was there was basically no pushback.
Arya
00:49:44
Unfortunately we still support open ssl and other rust projects and.
Martin
00:49:48
Yeah let's not talk about the rust crypto story there's.
Arya
00:49:54
Plenty to say yeah.
Matthias
00:49:55
Okay let's not go there but was that already written in async rust or was it still synchrost in 2018. That's sort of when the entire tokio ecosystem evolved. That's why I'm asking.
Martin
00:50:06
That is still at its heart sync. So the initial version also didn't really need async because the way that you collected the data back then was by way of rsync, which means that what we basically did and still do is we just spawn your system's rsync process to collect the data for that. Then I was also in an HTTP-based protocol to do this, which we added later. That one basically is base64 encoded data in an xml wrapper which is really interesting and because at that time and i yeah at that time there was no xml parser that you could use with async i think quick xml can now do it basically what i was then using was request in blocking mode because i didn't want to read the entire file into memory and then parse it because so i basically just build it that way and that sort of informed that RouterNager uses a thread pool with a blocking HTTP. And it also blocks until the RSync process, if you're using that, until that comes back.
Matthias
00:51:19
Yeah, in my mind, I don't really hate that idea. It's certainly predictable, or you have a very predictable profile of the application at runtime. That's a thing that, for example, in asyncrust is a bit tricky to do. Sometimes it's not very easy to say how asyncrust code will behave in production.
Martin
00:51:41
Yeah, it's kind of interesting. So obviously it has an HTTP server now, and that code, all of that is in async. So that is just like tokio with Hyper.
Arya
00:51:53
But I do think there are plenty of good use cases where you don't strictly need an entire async runtime. You can get away with having a few threads, and often that will, depending on your application, sometimes that can be the simpler and easier to manage model. But we did sort of i guess start using async more and more i think especially with the domain stuff where networking was a big concern and because it's a library you don't want to enforce our restrictions on the users and so we did have a lot of support in there for async things do you think.
Matthias
00:52:30
People reach for async rust too early both in regard to their rust learning experience And in regard to using it in production, you alluded to cases where a thread pull is good enough.
Arya
00:52:46
I think it's really hard to say. Asyncrust gives you the flexibility of deciding what your threading model is going to be later. So you can just write code that does not assume how many threads you will have or what your setup is going to be. And you can come to a more concrete decision later. And if you're going to rely on entirely synchronous code, then you're baking in an assumption quite early. So if you're working in a model where, if you're working in sort of a problem space where you already know what you have to deal with. That's somewhere where a completely sync system can be sort of less of a gamble. For example, we have one of the products we had to build, one of the things we had to build while developing Cascade was a signage, well, a compatibility bridge to help support hardware security modules. Because for DNSX signers, A large number of operators will often use hardware security modules, which are just bigger versions, essentially, of things like YubiKeys to do signing for them. And we need to communicate with these, but there's older protocols that we wanted to support. And so we ended up building a compatibility bridge for this. And that's one place where the application is simple enough and we have a good enough understanding of what our needs are that it does not need to be async. And a large part of it does just use a manual threadbook.
Matthias
00:54:19
Because one thing I'd like to quickly point out was that you both learned Rust at a different point in time. Martin learning it around 2015 when async Rust wasn't a thing. So did I and Arya. You learned Rust when async was already very well established. And I wonder if that shapes the way you think in Rust. To me, it always still feels like an addition or tacked onto the language. Right, that's interesting. to hear you both think about that for a bit and and what are your thoughts on this is that true for you too or like how do you think about async rost is it an extension of rost or is it the same language really.
Martin
00:55:00
I think now that we have async functions it feels more like really part of the language it felt more more sort of well more library-ish with with futures where basically you had these couple of traits and then like runtimes that were implementing stuff i think now with with async functions it's more integrated into there's still there's still a like a divide that is definitely there and i think that probably also always will be there because these things are just different but i think it feels more more integral to the language now Arya do you.
Matthias
00:55:36
Also feel that divide?
Arya
00:55:38
So I really enjoy programming language design. So I think sometimes I see, some features from a lower level perspective than others. I'm also especially concerned about like the implementation details for some of these. I've spent a lot of time looking at the trade interface for future, which has interesting restrictions. And I think I do view it as a, it's a very specific mode that I will work in when I'm choosing to write async code. But I'm I personally really I think I focus a lot on writing libraries it's where I feel more comfortable and for libraries you need to have a better understanding of what you're trying to do because at the end of the day if you're going to serve up an async function in a public API you're like it's the user's responsibility to run your code and there are additional concerns like cancel safety and understanding what your threading model is, which async code has to be more aware of. And so it does feel like writing async code is a distinct mode than writing synchronous code, because then you have these additional concerns to think about with how is this code going to be used, and that has different implications for async code.
Matthias
00:57:05
Interesting because it requires a lot of empathy for the user to make the right decision and you do make a lot of decisions for the user if you think about it when you write a library yeah.
Arya
00:57:18
Totally i've also recently been thinking about similar issues with for example logging and where, The way that a library logs data is usually not considered part of its API, but it is something that is important to its end users. And that's another case where I think we think about this. One issue that we actually have that I think is quite interesting over here is panicking. And Martin, I'm sure you'll have a lot to say about this. But because we're writing code that is in sort of a mission-critical area, we really try to avoid having spurious panics because you don't want somebody's name server to crash, for example. But it's also been a battle because you can have invariants that are guaranteed within your program, even within your library. For example, you might have some type which you've defined has certain invariants and, For example, you might say, I have a string that is not empty, right? I can always access the first character. And now implementing methods on that, for example, trying to access the first character would then usually return to you an option or panic in some way, right? And then our dilemma is, do we use .unwrap there or .expect? Because it's an invariant that our program is guaranteeing to itself, but Rust does not, the language itself does not know that, and so we still end up with our code that is technically capable of panicking. If there's an implementation bug on our end, that's capable of causing panic. And that's been a really interesting journey for us because I don't think we've fully settled on an ideal spot yet.
Martin
00:59:11
I also don't think there's a good answer there. It's just trade-offs. Yeah.
Matthias
00:59:17
And the trade-offs would be differentiating between internal and external errors, so things that are implementation bugs and things that are user-facing issues.
Arya
00:59:29
Well, you were talking about this recently with BCDR, Martin, where you're trying to decide, for example for slicing bytes for indexing byte slices whether to use unwraps or.
Martin
00:59:42
So this comes indeed from the code panicking in certain places either because you're unwrap or because you're using slicing or like indexing and we have had cves because of that because like this is gets this gets fed data from the network so you like untrusted data and of course it then And panicking from that untrusted data is kind of bad. And specifically in the case of Routinator, which collects a known set of data from the internet, So if you restart it, it will collect the same set of data again, and therefore it will crash again. So it's like feeding it data that makes it crash. Just publishing data that makes it crash is a very efficient way of breaking routing security or this particular kind of routing security for a lot of people, because Routinator also has a very high market share. So if you can trigger this, it will crash in a lot of places. So obviously, we kind of want to avoid that. So we want to avoid panic in that code which then means you want to use clippy for this so you want to ban indexing you want to ban unwrapping but then there is obviously because you are dealing with bite slices so you will have to do some sort of sub slicing and taking characters out or bites out of it at certain positions that just has to happen and finding a way that is both expressive, like you you mark this as i looked at this this is fine which we're now i think we're going to do with basically allowing that particular clippy lint at a particular position and sticking a safety comment on it i think this is what i'm now settled on it on for now but that makes the code more cluttered like it would be nicer to have a more compact way to express this this i've checked for this ideally that would be something that the compiler can also help with but obviously that is super hard because this is a runtime thing true.
Matthias
01:01:41
And at the same time for the people who might say oh yeah rust also has panics and c it's no better than c or whatever we had before just keep in mind that this is not a segmentation like it is it is not leaking information it might be a fault yes it might panic true but also it's not really expose exactly, rust guarantees that your data is still safe and the memory safety is still upheld even if you panic.
Martin
01:02:11
This is also an interesting point about like memory memory safety isn't everything, like even if you can yes you can make sure that like buffer overflows and that sort of stuff doesn't happen but a a denial of service problem might still be there yeah that's something that people quite often overlook when they say yeah but it's it's fine because it's memory safe but that's not all the issues that exist yeah.
Arya
01:02:37
Panic safety is i guess the umbrella term for these issues um which.
Martin
01:02:43
I guess comes back to what you said earlier where people were sort of looking at this one sec fault somewhere and then and then pointing at that that's sort of the consequence of putting memory safety and these sort of safety aspects center of your language that then of course people will be more critical about when that actually doesn't seem to hold.
Matthias
01:03:02
It was even sort of a good thing that the segfault happened in that case because it was in an unsafe section that dealt with i think a data structure like a linked list and that would have caused a memory safety issue if it didn't panic at this point yeah.
Martin
01:03:18
It's always linked lists.
Arya
01:03:22
I i don't remember what actually happened in that specific case whether it was was it a segfault But this is certainly an experience that we had that we didn't think of when we started with this project that panics are or can be a problem. And panic safety is a thing that you need to think of when you're designing things, when you're writing code.
Matthias
01:03:41
Throughout the conversation, we talked about async rust a lot. And we looked at it from various perspectives. What are the common traits that have evolved around asynchronous usage in your projects?
Arya
01:03:59
And I assume you mean trait, not in the rest sense.
Matthias
01:04:04
Precisely, yeah.
Arya
01:04:05
So one thing that we've really noticed that's been interesting to explore is how we use async for daemon processes. Because a lot of our work is in these long-running daemons, where you sort of inevitably end up with high-level state machines, right? Right. Thinking about, for example, right now we've been developing Cascade and Cascade has to worry about, well, it controls multiple DNS zones and then each zone can be moving through multiple states. Right. It could be loading data. It could be signing data. It could be then serving that data. And we often run into the case of, oh, are we just building these state machines? Isn't that what async is supposed to help us avoid? And so we inevitably ask ourselves, can these high-level perspectives of what our program is doing be expressed as async functions? Can we just write our entire application as one big async function? And across four or five projects, we've always eventually drifted to the answer being no. That it's always better to describe our code at the highest level in very explicit ways, even if that sometimes involves writing out state machines yourself. And, you know, writing state machines is not fun in any programming language. And that's been interesting to, it's been interesting to notice when async feels more appropriate and when async feels less appropriate. The sort of rule of thumb that we've reached is that if, you need to perform some action for a short period of time, right? So it's not like a state that your program is in, but just an action is doing temporarily. Then async is usually a very good fit. So if you're, for example, fetching some data from the network, so you're going to perform some HTTP requests, that might be a perfectly fine time to use async code. But in order to describe what the state of your application is, like the highest level states that a user might care about, So we try to avoid using async in those places. The main reason for this is that when you implement something as an async function, it's an opaque type, right? We know that there's a state machine underneath this, but you can't examine any of the details about that state machine. And examining details, having that degree of transparency, is really important when you're trying to build reliable software. So for Cascade, we try to offer a lot of ability to inspect what Cascade is doing at any time to see what state these important zones are in. And it's really hard to extract that information when you just have this opaque future object. The downside of this approach, I mean, it helps us make things more explicit. It helps us better understand what our state machines are. But the downside of this approach is that there's a lot of tooling for async functions that we can't use. For example, if we need to perform certain actions after a period of time, for example, when we sign DNS data in Cascade, we need to re-sign that periodically because those signatures expire, they have expiration dates embedded in them. We can't use tokio timers to express all of those expirations when we need to do more stuff. Again, because we need a greater degree of transparency. We want to be able to inspect what those timers were. And tokio doesn't have any way to inspect what timers are scheduled. There's no sort of good way to even try to do that. And also, because our high-level state machine is not an async function, we don't have a single good place where we could use these timers. We also need to, for example, persist those timers across restarts. And that's also, there's just no way to do that with a tokio timer. So we've slowly sort of started building up our own, like, not async tooling, but sort of this explicit state machine-y tooling that helps us deal with these problems. Often very strongly inspired by whatever you'll find in tokio or other async utility libraries. That's been a really interesting shift. We've also noticed that happening across all of our projects, right? I don't think any of our projects use, yeah, none of them use a top-level async function to describe what they're doing. They're all these long-running daemon processes. And so all of them end up with some form of state machine at the very top. That's been a really interesting experience. It's not something I've really noticed anybody talking about as well. That's been, I wonder how other daemons written in Draster dealing with this sync-async divide.
Matthias
01:09:05
Would you say you use structured concurrency a lot for the async features?
Arya
01:09:11
That's an interesting question. I think we do use structure, Okay, I don't think we use it significantly. Most of the time when we are actually dealing with async functions where structured concurrency isn't necessary, like where it's actually applicable, we're usually doing very few things because these async functions happen at the very edges of our application and they don't tend to have that much to do. But I think there would definitely be use cases where we'd see more use of it.
Matthias
01:09:47
And it's helpful in situations where you have a bunch of IO operations and you can sort of run all of them concurrently and then cross the boundary to, say, another part of the application, which might or might not be async. So this way you can have a clear separation between the different tasks that you're handling within your application.
Arya
01:10:12
So the way that we've dealt with this a lot in Cascade, at least, which is what I've been working on, is whenever we have some action that needs to happen in the background, per se, we will spawn up a tokio task for it. And that results in the tokio join handle object, right? And that lets you rejoin this task whenever. And we just save that within our state machines. So that's a very explicit way for us to track that. Oh, yeah, we have that background task going on. There's only one such background task going on. And if we ever need to, we can take that out, inspect it, or just wait on it. But usually what we then see happen is at the end of that tokio task, it would unlock that state machine itself and then go make those modifications. So it will sort of clean up after itself. And because of this model, there are relatively few times where we have multiple operations going on at the same time. But Cascade's also not that IO heavy.
Martin
01:11:13
Yeah, so we have a different project, Krill, which is the CA for this RPKI world. Basically a CA implementation. And that one is more input-driven. So it's more user gives command. Command leads to certain things happening. And the user gives command thing is an HTTP server. So this comes in as like a REST API. And there we indeed use... So it used to be more like it was async, but a lot of the code was written as if it were sync. So there's some slightly dubious things in there, like long-running CPU tasks that are actually within or long-running CPU operations that are actually within doki tasks which is tricky so I've just rewritten that or am in the process of rewriting this which was quite the adventure of doing exactly what you said so basically the HTTP server is async and whenever it has processed the input has verified the input like that authentication or all of these things it will then queue a a task for a sync core that runs on a thread pool and then just like just then what a threads or whatever however you have would you and process these things yeah and i think that is a really good model for this because it gives you yeah more and more visibility into what is happening rather than having lots of tasks and you don't really know how many tasks you have and all of these commands coming in from all over the place and like you have sort of like like you get back pressure and all of these things you get kind of for free which is nice yeah.
Matthias
01:12:49
Yeah it's a thing that also i haven't heard many people talk about which is using channels to decouple parts of your application more and really dealing with back pressure and introspection and so on it feels like when you run long running processes you need that level of introspection you need the level of control and maybe literally just using tokio tracing is not enough you you want to control it more on a on a, On a language level or on a runtime level, so to say.
Martin
01:13:25
What was interesting was that I had to implement my own thread pool and all of this communication myself. Because you have these queues and you have all sorts of synchronization features. There aren't really good bridging between sync and async. You can't use Rayon because that doesn't do any async things. On the plus side, it turned out that doing this was surprisingly easy. The tools, the components for this are there. What was great fun, in inverted commas, was the shutdown code, to cleanly shut down. You have to collect all of these. You have to collect all the tasks. You have to collect all the threads. You have to tell everyone that we're now shutting down. Getting that right was quite interesting to do.
Matthias
01:14:12
Is that open source somewhere?
Martin
01:14:13
That is definitely open source, yeah. Yeah. okay what's the project name we can link to it in the show yeah it's krell as in not the fish not the fish not the sea creature i get yelled at now because the colleague who originally wrote this is is a biologist by trade so he will yell at me now if he hears this podcast yeah and that all that code is now there's a new branch called full sync where where i've given that's currently in review because it's huge yeah you've been you've been toiling away at that for a while yeah so i did a lot of experiments i also tried to do because one thing that this has is there's a bit where it has to go out and do http requests as a client so i originally didn't want this code as purely sync because then you're blocking the thread until the http request comes back, which if that happens too much then you're blocking all the threads and that's kind of not nice, But all of these requests are to trusted servers, so I think it's fine to do it this way. But initially, I tried to build a thing where it could go sync, async, sync, async, sort of, like switch, jump between different modes and have tasks that sort of jump. Because these then have to be sent in static. You need to clone a lot of stuff because it needs to own all the things that it has. Getting this jumping was extremely hard, and I basically never got it right. So eventually I just decided, you know what, this is just not ever going to work. It's going to be difficult to reason about and to maintain. So let's just do the sync and maybe look into move those tasks that do these HTTP requests. Because it's not all of them, it's just like a few of them. Maybe have a separate thread pool for them so that they're only blocking that bit and you can have the regular thread pool for four tasks that don't need that. So you have to look into that as well. But for now, it's basically just, we just basically shrug and say, yeah, it's probably going to be fine.
Matthias
01:16:10
Yeah, I'm not sure if we can dig too much into that. But what I found was that if you have tasks that are too different in nature inside of your application, and you don't clearly separate those, you're sort of asking for trouble. So you kind of start to work against the framework a little bit.
Martin
01:16:31
Yeah.
Matthias
01:16:31
Because the tests, they have different runtime requirements, they have different memory usage, different performance profiles, and so on.
Arya
01:16:38
It's hard to manage interactions between them.
Martin
01:16:41
Yeah, also my experience has been that every time you say, yeah, it's probably going to be fine, it definitely isn't going to be. So I should probably look into this.
Arya
01:16:50
But yeah, I think overall async as a feature has been really nice for us. It's been really helpful for getting a lot of this down. And as I said, it lets you be really flexible about what your threading model is going to be. And I think that's really cool.
Martin
01:17:05
Yeah, and writing the TCP or HTTP connection handler or request handler as an async function, that's great. That's fantastic. Yeah. It makes it so much easier, it makes it so much clearer, much easier to understand what's going on, what you're doing, and then basically do that and do a queue to bridge into like an event or like a sync core. I think that's a great way of doing these things.
Matthias
01:17:33
As we're closing out, the final question traditionally is always, what's your message to the Rust community?
Arya
01:17:42
I think in general, I don't know, like, I think programming is fun. If I had one thing to say, it'd be, make sure you have fun. But for Rust, for the Rust users specifically, I think, I don't know, Rust is such a good language and there's so much fun to be had with, as I mentioned before, like, I think Rust's power comes from being able to design really elegant and really intricate APIs. That let you get precisely the point across that you want to. And we've had a lot of fun doing this in Domain, for example. We've also, one of our colleagues here, Therets has been working on a scripting language, which integrates really tightly with Rust. And that's another place where we've had really interesting API discussions about how to make this stuff work. It's called Roto. And it's been fascinating to try to see where can we sort of you you're trying to make this really delicate sculpture and figuring out where can i draw the lines here is something i think rust is really good at so i think make fun apis use the type system to your advantage have fun with it.
Martin
01:18:58
Keep doing what you do and stay where you are because a lot of what makes rust great is the people who create rust who work on rust But the attitude, the way the community, the welcomeness, the openness of the community, I think, is a huge part in why Rust became a staple and not just was a fad. And I think if we just keep doing that, then we will have fun. Yeah.
Matthias
01:19:27
Well, I certainly had fun and I can get behind those statements.
Martin
01:19:33
Yeah.
Matthias
01:19:34
And for me, all that is left to say is thank you, Arya and Martin, for taking the time for the interview today.
Arya
01:19:41
Thank you very much. It was great.
Martin
01:19:44
Thank you.
Matthias
01:19:45
Rust in Production is a podcast by Corrode. It is hosted by me, Matthias Endler, and produced by Simon Brüggen. For show notes, transcripts, and to learn more about how we can help your company make the most of Rust, visit corrode.dev. Thanks for listening to Rust in Production.