NLnet Labs with Arya Khanna and Martin Hoffmann
About securing the internet with Rust
2026-05-07 81 min
Description & Show Notes
Every time you load a website, send an email, or update an app, you're quietly relying on a handful of unglamorous services that route your packets to the right place: DNS to translate names into addresses, and BGP to figure out how to actually get there. When these systems break, or get attacked, the Internet doesn't just slow down but stops working.
For more than 25 years, NLnet Labs has been one of the small, non-profit teams keeping that core infrastructure running. Their software, including the DNS servers NSD and Unbound, the RPKI tools Krill and Routinator, and the new DNSSEC signer Cascade, is deployed everywhere from hobbyist Pi-Hole setups to Let's Encrypt and major Internet operators. And increasingly, it's written in Rust!
In this episode, I talk to Arya Khanna and Martin Hoffmann from NLnet Labs about what it takes to maintain critical Internet infrastructure as a small team, why they bet on Rust for new projects like the domain crate and Cascade and what the rest of us can learn from a codebase whose users include the people who keep your routes flowing.
About NLnet Labs
NLnet Labs is a non-profit foundation based in Amsterdam that develops open source software and open standards for the core infrastructure of the Internet. Since 1999, the small but dedicated team has built some of the most widely deployed building blocks of the modern web, including the authoritative DNS nameserver NSD, the recursive DNS resolver Unbound, and the RPKI tools Krill and Routinator, which secure global Internet routing. Their work is trusted by operators ranging from hobbyist Pi-Hole users to Let's Encrypt and major Internet service providers. In recent years, NLnet Labs has been steadily moving its new development to Rust, with projects like the domain crate and the Cascade DNSSEC signer leading the way.
Links From The Episode
- NSD - NLNet Labs' first project
- lychee - A link-checker that receives funding from NLNet (not NLNet labs!)
- unbound - A DNS server like BIND, but only for recursive queries
- Cascade - The new DNSSEC signing solution from NLNet Labs
- Pi-Hole - A small usecase for unbound
- Let's Encrypt - A big user of unbound with scale and security requirements
- Asahi Linux - Linux on Apple Silicon, mostly with Rust
- Binder CVE - A CVE in Rust
- LDNS - A collection of DNS functions, written in C, now in maintenance mode
- domain - The new collection of DNS functions, written in Rust
- tokio - The biggest shared dependency across the Rust ecosystem, first announced in 2017
- Rust in Production: Helsing with Jon Gjengset - You can take generics too far
- bytes - Tokio's Arc of bytes
- Arc Welding - The other type of "fixing"
- Alejandra González' crate dependency analysis - 46% of published crates depend directly on tokio
- RPKI - Signing and validating IPs and routing information
- Routinator - A RPKI validator, one of the first Rust applications in production
- hyper - The ubiquitous HTTP crate
- Krill - The RPKI Certificate Authority tool with "fun" shutdown code
- Roto - Tert's scripting language, used by another NLNet Labs project, Rotonda
Official Links
Transcript
It's time for Rust in production. I'm your host, Matthias Endler from corrode.
Today, I talk to Arya Khanna and Martin Hoffmann from NLnet Labs about securing the internet with Rust.
Arya and Martin, thanks so much for taking the time. Can you introduce yourselves and NLnet Labs?
Yes yeah hi i'm Arya
i'm a software engineer here i've been using rust
for i think five or six years at this point it's been it's been really fun i've
been using it since i started university and then i jumped into NLnet lab
straight from there i've been working here for about a year and a half or two
years now it's been really fun.
I'm martin i'm also software engineer i've been at
NLnet labs a bit longer for i think 10 years
now and i've been doing rust since 1.0 since
2015 which i think also pretty much coincides with
me starting here so i basically brought rust with me here i also indeed started
the use of rust in in NLnet labs more or less guerrilla style and then later
on the organization decided to embrace it which was a really cool path.
There's a funny story to this because we met at FOSDEM and we,
I mentioned that Lychee, which is a link checker that I help working on, is sponsored by NLnet.
Yeah.
And then your response, Arya, was that NLnet Labs is different than NLnet,
the organization that funds the work.
So maybe we can clear up that confusion.
Yes. So NLnet was the first, I believe, commercial ISP in the Netherlands.
And but they weren't they weren't privately owned
they weren't owned by by universities and and sort
of organizations of that type and eventually they got
sold and the money from that sale
went into a foundation and that became an
lnet and their task was then to sponsor or
to finance development for the internet which is
what they still do they hand out grants the money
money has since run out but they still sort of work on that
on that in that field and in 1999
they decided that they wanted to
develop a a dns server this came this came out of of right because back then
everyone was running bind and some people who were running the dns root zone
decided well that's a bit scary and so an lnet labs was set up to to develop
a alternative which became nsd and that's sort of the story.
Initially, NLnet funded pretty much all of NLnet labs.
Some other organizations shipped in, like SIDN, the .NL people.
But over time, we're sort of shifting more and more towards a support contract.
And a grants-based model, financing model, so that we're not solely relying
on a single entity to sponsor us.
But now we're doing, after NSD, we expanded a bit.
We also do Unbound, which is a very popular recursive resolver.
We've also expanded into the internet routing field with BGP and RBKI.
And I think, Mark, you can expand more because that is not my department.
And, of course, we've been working on a bunch of additional DNS products.
Most recently, we started working on Cascade, which is a, I guess,
a DNSSEC signing solution.
And it's replacing our previous project, OpenDNSSEC.
And we're building now on top of Rust. And it's been a really fun experience.
I guess, in summary, for people that are not that much into networking,
you build things that run the internet.
I think we contribute to some of the critical infrastructure that keeps the
internet going, and we're really proud of that.
What is Unbound?
Unbound is a dns recursive resolver which means
if you query for like say an
ip address you don't go to the to the you don't
go to the servers directly you first have to find out where the servers live
because dns is like a giant distributed database and basically
what unbound does is it figures out figures out where to
go and does all the queries and what it
also does is then if lots of people do the same queries it has
a caching thing so that you don't have to go all the way all over
again that's a very complicated thing
there are in unbound there
like 25 years of experience in corner
cases there's a lot of security things in there dns can be a lot so these dns
queries can be very efficiently used for amplification attacks for for for ddos
attacks so there's a lot of work in there to work to make sure that that doesn't
happen or to mitigate people trying to do that.
Yeah, it gets pretty complicated.
Yeah, it's a really complicated piece of software.
But yeah, I guess it's just serving as a worldwide cache, depending on where
your servers are, but just serves as a cache to help DNS clients fetch information
faster, more efficiently.
And did that work on Unbound, which is a C-based project, originate at NLnet
Labs, or did you sort of inherit that from somewhere else?
I'm honestly not sure. I remember there was a prototype in Java way back when.
I don't know if that happened here or if that happened somewhere else,
but the current C implementation definitely started here. I really should have asked.
We have had other projects that have originated from a more sort of collaborative point.
So OpenDNS Sec is one example, which I think we'll talk about more in a little bit.
But yeah, most of these projects are NLnet Labs or traditionals, I guess.
Now, you mentioned that in 1999, you started to build your own DNS server, NSD.
And that is mission-critical infrastructure that cannot fail.
Back in the day, we didn't have Rust.
That's why you used a different language, C, to build it with all the baggage
that comes along with it.
And yet you managed to build a very high quality reliable open source tool there,
can you talk about writing c at that level what does it feel like to build,
tools that stand the test of time in c um.
I wasn't part of that so i can't really unfortunately i have done some some
development work in c but i don't think i've done that at the level of nsd or unbound.
I think a lot of it is experience.
There's a lot of testing, a lot of just time. You need to be super careful obviously.
I think really a lot of it is just that they have been around and have been
battle-tested for a long time.
It also, I think, really helps for the developers we have here who are just
really familiar with the codebase.
So they have a really good intuitive understanding of where all the different
parts lie And I think as long as if a developer has that entire mental model
in their head of what this application is supposed to do,
and they've had the time to make sure that their mental model and the actual
implementation correspond really well, like the people here have,
then it sets you up really well, regardless of what language you're implementing in.
Does Unbound still see any feature development, or is it mostly bug fixes by now?
No, there's quite a lot of features. Well, not quite a lot, but there are still features being added.
So DNS, surprisingly, even though it's like, what, 40 years old,
there's still a lot of development ongoing.
People invent new things, add new things, and those are typically implemented
in Unbound when they are being standardized in the ITS.
So Unbound, in that sense, also serves as like a platform to try out things.
I believe one of the features that comes to mind recently was DNS over Quick.
But yeah, it's interesting how even though DNS is so old, we're still seeing,
it doesn't feel like we've reached any sort of endpoint.
We still, NLLet Lab still participates in a lot of DNS conferences,
which there are a surprising number of.
And it's really interesting to see what features and what ideas people have
and where they're trying to take the protocol.
It's also really important work because DNS was obviously developed at a time where,
Internet security was a very different thing.
And now we're at this point where we have so many additional concerns that the
protocol was never designed for.
And we're continuously working towards resolving those.
And you can't resolve, a lot of them are to do with the protocol itself and
not a particular implementation.
So there are plenty of issues that just Unbound or just NSD cannot implement fixes for.
We need to talk among different implementers,
with the whole community, and look for solutions that everybody can implement
together because of DNS's distributed nature, you need that sort of collaboration
in order to make this protocol stand the test of time.
It's a long and arduous and very active process.
Can you name a few organizations who use Unbound in production? Yeah.
Pretty much everyone. So the interesting thing about Unbound is it works from
large ISPs using it for the DNS servers for their customers to people using
it on their Pi-Hole as the DNS server for doing the queries locally.
So the scale is really quite buffling. It's like from just running on a Raspberry
Pi to a cluster of these things in a large ISP.
An interesting one might be Let's Encrypt, who need DNS to do the verification
for when you request a certificate.
So they need to have DNS resolving that is fast, that is reliable, and that is secure.
So I think that's a really good example.
Other examples are a lot of ISPs use Unbound, and then they don't talk about it.
But it is in a lot of places.
Now a lot of these organizations used it as a critical layer of their,
infrastructure it's foundational work so it would be sort of misguided to think
that it has to be rewritten to rust by design just because we can that's certainly
not a good use of our time What do you think about that?
Would you say it makes sense to rewrite Unbound in Rust or would you say no,
we should rather keep it around and maintain it?
I think, so we've decided to, at least for now, maintain it and keep it.
The main motivation is there are 25 years of experience in it.
It has been extensively tested, so there have been security audits and all of these things.
And obviously, it has been used in production in anger a lot.
So I think to get to the same state that Unbound is right now,
if you want to start from scratch, that's a lot of work.
That's probably like five years of work, I would sort of guess.
And we know it's a good piece of software. We know it has very few issues.
So I think we can use our time more efficiently for other things than re-implementing and on.
We might eventually do it because you never know what's going to happen in the future.
But I think for the near and middle future, they're probably not.
Martin, you mentioned that you've been using Rust since around 1.0.
That was around 2015 what initially drew you towards rust and what was your first project in rust.
Then a colleague in a previous job suggested rust
to me and saying you will like this language and they were right so i
just basically i gave it a try and i am someone
who learns a language by just implementing stuff and at
that time i also started here at another labs and i
figured i need to learn and understand dns so my
first project indeed was a dns library in
rust which is probably not the best idea but it
was quite a lot of fun and indeed it turned into our
dns library that we're now using internally and
also that is available and call it domain that was
my first project indeed and it started out as like a private project as like
a hobby project on the site but then eventually i talked to two colleagues here
and we agreed that it should be adopted by an and that we should build our dns
rust things on top of that so that was the very interesting course of events or,
path i guess did.
You have those conversations pretty early on or was it later in the game it.
Was quite late it was basically when we
started to decide um to do more things
in rust so here our first official use in rust i think it was in routing security
in like in 2017 2018 i think we started with that and that's also when i sort
of had to let it go a bit because i did more work on routing security,
so I didn't have the time for domain anymore.
And somewhere in that time, I then asked if we can maybe sort of adopt it in company.
Also for maybe spend some more time on maintenance than I had at the time.
I could imagine that there were a bunch of veterans who didn't want to let go
of their prior knowledge in C.
Was there a lot of resistance when you started to explore using Rost at NNNAT Labs?
In company, not so much, I think, but maybe also because we didn't sort of immediately switch over.
Unbound is still unbound, and there was also never an intention to make that go away.
So it always felt a bit like we're doing new things in Rust,
but we're not throwing away everything we have and all the experience we have.
So I didn't really feel much resistance.
Do you think that was a big selling point for Rust, that you didn't have to throw away everything?
Not sure because like again like these were completely separate things right
we started with routing security which was which was not something we had done
before so it was an entirely new track so maybe that was also a smart way to to approach this okay.
Just try it when there's something new yeah.
Yeah, I hear that a lot, and I see it in organizations.
They start to adopt trust in areas that are central to an organization,
but not critical, not mission critical.
And sometimes, if you can find a greenfield project that maybe you want to communicate
with through a network boundary, this is an easy way in.
It feels like this is a similar story here at NLnet Labs.
Yeah i think i think it also because like the argument is if you start a new project,
if you start a new project especially like a security related project
in 2018 or whenever it was doing that
in c is just like a little wrong obviously there would have been other choices
and we also explored them like we had like a short period another colleague
who also started a project at the time thought maybe do this and go and we just
did like a yeah just just go and try it.
And they very quickly decided that, yeah, let's do Rust.
And why was that? Can you remember?
I think he didn't really like Go. It feels a bit, I don't know,
I don't want to say bad things, but like it feels not quite as modern as Rust, I want to say.
Like the type system is not nearly as powerful.
And maybe also, because I was a bit early, and maybe he also saw what I did
and just agreed that maybe this is a better path to go and sort of do things
together and not have two different projects in different languages.
Do you remember if there was some sort of aha moment where you show people your
work and they started to get it, they started to understand that this was a
very powerful piece of technology?
Can't think of any concrete moment now so.
It was more of a gradual transition.
So i think it's also
the the rust projects that we started a lot
of that was with new with new colleagues who started here
and quite a few of them also specifically started
because they knew they could do things in rust here which which
is very interesting because like normally lower back then the the
narrative always was yeah you can't do rust because you can't find
people and our experience actually has been the
reverse it has that like people are super willing to do rust in
production but we have a really hard time
to find c programmers now that can can maintain the projects that we have so
that's maybe also one of the reasons why eventually we might have to like move
away from unbounded c it's just that like we just won't have the people to maintain
it anymore when the current sort of generation retires or well that's a long
time probably Or moves on. Or moves on, yeah.
It is a real problem because C often doesn't get taught in university anymore.
Yeah, exactly.
And people might be afraid to touch it after graduating on their first job because
they've heard a few scary things about C and how it can be misused.
There's plenty of good C horror stories. I still feel relatively optimistic
because, so I finished my, I started my bachelor like almost five years ago at the TU Delft here.
And the very first quarter, they make you learn assembly with very little help.
And based on that experience alone, I think I'm hopeful that we're going to
still have a generation of programmers who are willing to touch this,
but they are harder and harder to come by.
Well that's a positive look
at things yeah um but also
the the job market in and of itself maybe doesn't
really lend itself to see veterans right
i don't i don't know where i would look
for people with that level of expertise right away it's it's one thing to maybe
get in touch with it it's another thing to try and want to work with that language
on a daily basis i certainly wouldn't want to do that but yeah that's just me i'm biased here yeah.
But it's also like i kind of sort of don't want to suggest plus to people because
my own experience is that once i tried rust i don't i really don't want to do
c ever again so the people who do who do the c development i don't want to suggest
to them to destroy rust because we might lose them too let's.
Just do good yeah but but really we've all had i think rust has been so fun
to work with and also i think it's changed the way that I do programming at all.
Like I think having now spent more than five years writing Rust,
I think I would write C code differently than I used to before,
even at the same degree of experience with it. And I find that really interesting.
Like it really changes your perspective.
And I think that's part of why it feels, I think that's part of why people who
switch to Rust have a hard time trying anything else.
Once it changes your mental model in that way, you struggle to use other languages
because they don't offer the same features to help you express things.
You mean expressing in the type system?
Yeah, I think one of the... So on the one hand, there's aspects like in Rust,
you can craft very elegant and very precise APIs.
You know, something that you'll often find,
in a project that's trying to use the type system well is, for example,
zero-sized types that prove that something is true, where you can set bounds
so it is only constructed under certain conditions. You can't do that in C.
Often you can just create new type wrappers around things to express that,
oh, this is a string, but it has certain invariants on top.
And there's just a lot more boilerplate and it's harder to get that point across
in other languages. So that's definitely one aspect.
But I also think things like border checking and the ownership model,
which are really fundamental not just to they're less focused on how you actually
write interfaces but more on the actual programming you're doing and I think that's one place where,
Rust's model is substantially can be substantially different from what you'd
expect to do in C or other languages.
But what if there was someone on the team who might be a Rust skeptic,
maybe someone who says, it's a skill issue if you can't write production-ready Rust card?
What would you tell them? How would you convince people to give it a try?
It's hard to i think one
of the best examples one of
the points that i've seen recently that really reminded
me of how good rust is to us was the
experience of the Asahi Linux people so if you don't know Asahi Linux is a project
to port Linux to the new the newer macbooks and one of the tasks they undertook
was writing a gpu driver which they did in rust and And now to think that somebody is trying to port a,
is trying to implement a GPU driver for an undocumented architecture,
which you would normally have to do in C at the kernel level,
it just, the prospect is horrifying.
It sounds like a nightmare to deal with. And their experience writing this in
Rust was, I believe they had two or three bugs, specifically two or three.
And I find that just incredible.
To be in a place where you don't have to worry about memory safety issues,
not in the sense that you don't have to worry about them.
They are so much, they're so contained that you know specifically where you
should think about this and 99% of the time you're not thinking about it.
To see, yeah, so features like that and to just have that full experience just
reminded me that, yeah, Rust is being really good to us right now.
We can now write software much more confidently.
Often just, if it compiles, if Clippy's happy, then this code just works.
And that's a sense of confidence you don't get with other languages, especially with C.
I agree with you, but at the same time, it's a bit like explaining to someone
how to ride a bicycle who hasn't seen a bicycle yet.
That's fair.
That's the tricky bit. This is an experience you have to make yourself.
You write code the entire week, you compile it, but you never run it.
And then on Friday, you stick it all together and it actually works. that's such
a profound experience especially indeed
if you come from sea where then you spend two weeks chasing sex
faults that but yeah like you can tell people that that's true but they will
never believe you because why would they and yet it's super hard to to just
convince someone who's especially if they're not willing like if they think
this is just a fad and it will go away and we're all going to go back to sea
and in five years again i think that we're we're past that phase now.
I think just because time has passed and Rust is still here and more popular than ever.
So I think we're past this particular phase that, yeah, it's just another fad
and next week we have another language that we're all going to chase after.
I think Rust has proven to be here to stay.
Maybe that's an argument. I don't know. But if someone who really doesn't want
to try it, then I don't think I would even try to convince them.
There was news about a segfault being introduced into the Linux kernel in one
particular gnarly area of the binder driver,
and people made a headline out of it.
And on the same day, there were, I think, hundreds of other CVEs that were disclosed
in the C part of the Linux kernel.
And I always find it fascinating that you can kind of disregard all of that
evidence and maybe point out flaws that, yeah, obviously still exist,
but at a much lower level in Rust than in other languages.
And people say, why would you rewrite a perfectly working system in Rust if
the C library was maintained for decades, right?
In your case, for example, Unbound is a case where maybe you decided,
no, we don't want to rewrite it in Rust, but there might also still be use cases
for Rust at this infrastructure level in distributions in Ubuntu,
in the Linux kernel, and so on.
Yeah so so we're also maintaining a library called ldns which is like a c library
to do to do dns things and we actually for that one we decided it has reached the end of its life,
and we are working on a replacement for that in rust
so this there's two parts of this there's the actual
c library which we will maintain and allow like we will fix
bugs but we will not add new features but also there
was a bunch of binaries coming out of this which always
were intended as examples but then people took them and like you will find them
in all the Linux distributions and people actually use them for production stuff
which was never intended and we're basically those we are now replacing with
slowly with my tool set built on top of the domain library in in in rust,
um so is rust.
A default for new projects now.
Yes, it certainly. Yeah, that's a decision we took in, I don't want to say like
two years ago, that basically we were betting the ship on Rust.
So about two years ago, I think around the end of 2019, start of 2020. No, wait, no.
That's not two years ago.
Sorry.
It feels like two years ago.
End of 2023, start of 2024.
Yeah.
We worked with the Sovereign Tech Agency with their fund, And they funded a
large part of our work on domain.
And that was, so we sort of spent all of 2024 working on taking domain from
something that wasn't really developed much past that hobby project phase into something that is much,
much more production ready, something that you can actually use.
And i think about that time we had really settled into rust is here and this
is how we're going to be moving forward with things.
Well that's fascinating because domain was
what martin's first rust side project turned into were there any design decisions
that in hindsight you would have changed or let's say anything that you regret
on the initial design or did it pretty much evolve very naturally.
I think well there's always things you would have done differently but also
to a large degree because this is so old um this is like 2015 rust and i think
there's a lot of stuff in there that we would do different now just because
we can do them differently um for example,
the whole like bite slice handling i think the the um the borough checker has
become a lot smarter so you can do more things that would have been tricky back then?
Obviously, I think this even predates us and Grask and tokio and these things.
So that was all fun. I remember working with my own state machines.
Even in the very initial future implementations, where you basically had to
implement your own futures, that stuff was also all still, or is still in there,
partially. I think we removed most of it.
So I think all the networking bit we rewrote as part of the project that Aya
mentioned. And what is currently happening is we actually are rewriting lots
of parts of it. So Aya, she did a lot of that.
Just because I think, not necessarily, I hope, because the code is horrible,
but it's just because it's dated, because Rust has moved on,
and you can do things in a better way now.
And I completely agree that that's true.
And yes, it has been my first Rust project, and you'll see that in points and places.
When I joined Anilat Labs, which was end of 2024, I joined just as this work
on domain was coming to a close.
I was looking through the API
and I realized that knowing that the project had been around for so long,
I realized that there were actually a bunch of interesting language features
that had evolved since then, which made it a lot, which allowed us to simplify the API a lot.
So at the moment, the current API, which is the same as it's always been,
is heavily generic because handling a lot of these, a lot of the data that you
need in DNS involves byte slices in some way or the other.
And we would always make everything generic so you can put a VEC in there or
put a reference to a data that's allocated elsewhere.
But those genetics sort of complicate the API a lot and make it really hard
to see how to actually use these things sometimes,
And so I really enjoy dynamically sized types.
There's a bunch of ways in which we can trim down on our API surface,
simplify some things, and simultaneously make them more efficient.
We have started doing things. We started parsing and serializing data in a more zero copy fashion.
So that's also been really helpful as a way to look for nice performance improvements
along the way. and that's what a major part of the this not strictly a rewrite
but sort of overhauling a lot of the apis about.
It's funny that you mentioned that because we recently had john from helsing
on the show and he mentioned that you can totally take generics too far and
sometimes that's a rite of passage that a lot of people have to go through when
they write more rust or they write their first production grade ROST,
because not only do we have to think about the library, but also how the library
gets used in the application code.
Can you maybe look at it from your lens? Is that true for the domain crate?
And can you also talk a little bit more about the dynamically sized types that
you mentioned? I'm interested in that part.
Okay. Okay.
One of the things that we found in Domain was that we had a lot of layers that
were trying to be generic over parameters like this, such as what byte slices you're using.
Are you going to use, for example, tokio has, under the tokio umbrella project,
there's the bytes crate, which provides this.
It's essentially an arc of a slice of bytes. You can copy it around very efficiently.
It's heap allocated. You don't need to worry about having a lifetime.
And so for us, often the choice was, do you want to use a borrowed slice?
Do you want to use a VEC? Or do you want to use bytes?
And because of this, we often had a bunch of genetic parameters everywhere.
But when we would try to write code that was genetic over that,
it was, we would often end up, it was too complicated to work with the genetic
parameters themselves.
So often we would just copy all of the data out of them into a new allocation
where we have a concrete type, like a vec.
And we're like, okay, now this is a vec, so our code can work with this.
We don't need to worry about what generic bounds are involved and how to process this thing.
We don't need to worry about, can this thing be mutated or whatever.
We just have a single bound of, it is a byte slice, I can access the bytes.
We would copy it out, work with it, and then convert it back into that form.
And that ended up, that became a really common pattern in our code. And that's,
That's exactly what we were sort of trying to avoid with providing this amount
of these generic parameters.
We wanted to be able to use those parameters, but just expressing all those
bounds became really hard.
And so, yeah. So it went into higher kind of lifetimes and that sort of thing.
Yeah. Yeah. It became really crazy. The reason is that what you basically want
to do is you want to take out parts of a byte slice and return that.
And if you have a slice, then that's basically just a slice.
So it has the same lifetime as a slice if you have a vec you need a slice but
if you have if you actually have a vec then you need to fabricate the lifetime for that slice somehow
which is done which is the lifetime of the reference to the
vec that you're having and if you have a bytes because
that is a like it's because it's an arc you can take out like
a bit which is then still owned you actually don't have a
lifetime at all because you have an own type and expressing this
generically especially before guts but what is it generic associated types turned
into a very interesting exercise in higher kind of lifetimes which was really
quite great so you had lots of four four we.
Were basically we were just stretching the language beyond what it was capable.
Of so you often had like three lines of of where clauses for a function.
And then because that all sort of it transitively gets further and further.
So if you have a function that uses a function and that also has those straight
bounds and that indeed just became a little more crazy than it needed to be.
Yeah. We have, so for example,
In the domain library, we have a mechanism for essentially building a DNS server
where you can implement certain traits and then that will allow you to receive
DNS requests and then decide how to handle them,
including passing them down to later layers.
And so we use this to, for example, build the DNS servers that we're using in Cascade right now.
But these can often involve a lot of generic parameters, and especially because
then you'd end up with layers that are nested as generic parameters of other
layers, you can get some incredibly long and complicated types.
And so we've been looking for ways to avoid those cases and try to make it simpler.
Because not only does it cause frustrations for us trying to implement around
this code and try to remain genetic across everything,
but it's also, I think, complicated for the users when they're looking in their
LSB or at a compile editor and they see a ginormous type, which they didn't
really expect to see there.
And it's just hard for everybody mentally.
Yeah, the error messages were crazy. Java-like.
Well, this is certainly relatable. I've been on both sides of that equation,
both as a library maintainer and as a user.
And the very reason for introducing those generics was to make the code maybe
efficient and maybe catered to the inner type that you want to have a generic over.
But at the same time, when you copy out the contents and convert it into a vector,
cloning that, like basically creating your allocation, you're kind of working against it a little bit.
And ergonomically, it also sounds not the best.
Yeah.
But how do dynamic size types come into the equation here then?
We actually went through a period where I began introducing some DSDs into the
system and then we realized that not everybody has really interacted with DSDs before.
And it requires a reasonable amount of sort of explanation.
So normally in Rust, if you have something like a U64,
That type, if you have like a local variable that's a U64, then that is just
a value that you can work with as is, right? You own that value.
And if you have something like a slice, right, a reference to a slice,
now that is a pointer to data that is located elsewhere.
And the important thing is for all of your local variables, they all have to
have a fixed size because the compiler needs to know how to move them around.
And it gets very complicated if it does not have a fixed size to work with.
So U64 is easy because it's always eight bytes, right?
But when we're working with pretty much all of the data types we need to deal
with for DNS are variable-sized in some way, shape, or form.
So like a domain name, right, which is just the most important thing,
can be up to 255 bytes in size, or it could be four bytes, right?
And you have to decide, well, how am I going to store this? One option is that
you always take the biggest possible size and you say, I'm going to define a
255 byte buffer and that is my domain name.
And every time I want to think about a domain name, I'm going to use this 255 byte thing.
But that is a performance issue because the compiler is also going to have to
copy around 255 bytes wherever you're going. And that's no fun.
And so you don't want to work with that whole thing.
You want something that is variable sized. In domain, what we have right now
is a domain name type, just named name, and it is generic over how the bytes
underneath it are stored.
So it is just a transparent wrapper around whatever bytes you want.
If you have bytes that come from a fixed size buffer, name can deal with that.
If you have bytes that are stored and referenced from a slice,
then name will just wrap those for you.
So you can define, for example, a 16-byte buffer, which is holding your domain
name, and now the name type from domain can reference that, can hold that.
But that requires you to have generic parameters everywhere.
The alternative approach is to really think about how that byte slice works, right?
So as I said, the variable-sized option is to have a reference to a slice of bytes.
But there were two different components there. There's the reference,
and then there's the slice.
So Rust actually has this first-class concept of a type that does not have a
fixed size. So if you just write slice of T, right, square brackets of T,
now that is a variable size type.
You cannot hold that as a local variable on its own, but you can still interact
with it indirectly, right?
You can actually have a box of slice of T. You can have a reference to a slice of T.
And so Rust actually allows you to define your own types with that property.
So in the new API that we're building, our name type does not have a generic parameter.
Instead, it is a byte slice, essentially.
So you would hold it by reference, or you can put it in a box the way that you can a byte slice.
And that sort of takes the generic parameter away, essentially.
So if you wanted, in the old API, if you had a name of a box of a slice of bytes.
In the new API, you just have a box of a name, right? That box of slice of bytes
just became a box of name.
So in a sense, rather than making the type generic, you separate that and move
the generic out of the type.
Yes, exactly.
And then you have a generic pointer to a type that is somewhere and you need
to deal with moving that around, the ownership and so on. And it depends on
what you do with it. It's not imposed on the type itself.
Yeah. And now this has some limitations.
For example, we can't use the bytes crate with this system anymore.
Because the bytes crate always holds normal byte slices.
And now these are technically not normal byte slices.
So you would have to come up with some custom specialization of the bytes type
that works for this, for example.
And thus far, our choice has just been to not add that sort of support.
We mostly just work with references and with box allocations.
But I'm assuming that you have conversions in place for converting that into
a thing that bytes understands.
Yeah, yeah, yeah. You can convert back and forth.
And you can get quite far if you just use an arc, like stick it into an arc
instead of a box, and then you're very close.
Yeah exactly arc of name works exactly the same and it has a bunch of the properties
that you would normally need.
Would you say that arc is an underused type in rust.
That's.
Not not by me we.
Have a lot of arc based code here.
So i'm trying to establish a term for fixing your lifetime problems by sticking
everything in an arc i want to call it arc welding.
Oh wow that's a nice one.
Yes i'm trying to make that a thing but yes
i think like a lot of the things that you especially if you're
like normally you would just do boxes or strings if you if you work with strings
but if you have concurrent code then arcs are extremely handy and especially
if you then also use things like arc swap where you where you have the ability
to just take a like take a copy of an arc and place an arc in a shared container.
You can you can atomically swap yeah The Arc data.
That is, in many cases, if you're things like, I don't know,
metrics or stuff, then very often it's very handy, it's very simple to use,
and you don't have to worry too much about concurrency. Yeah. It's super handy.
I actually think Arc is not going to be an underused type because,
as an example, everybody's using tokio, right?
I think Alejandra González, who's one of the Clippy developers,
was running some tests recently and discovered that tokio's by far the biggest
shared dependency across the Rust ecosystem.
tokio, if you're spawning tokio tasks, right, those tasks cannot reference data
from outside their caller. They all have the tick static bound.
And so if you're ever, I think for any application that is seriously working
with tokio and it's sending data across or sharing data between different tokio tasks, ARK is sort of,
the primary solution that people will reach for so yeah i definitely don't think
it'd be an underused type within these contexts.
And if you allow me to pun this is also our
arc back to internet labs because when martin introduced the domain crate or
he worked on it asyncross didn't exist so the domain crate must have been sync
did that change was that a thing that you later on,
change to being async or do you leave the core sync for good reason.
A lot of the networking code actually only was
written in that sta project so in
2024 yeah so the i think what we had was a very simple stop resolver so stop
resolver is the thing that sits in your typically in your c library and you
just give it a name and it gives you an address back and it does all of the
dns shenanigans for you so.
That's pretty much just it's the default it's the thing that secretly does dns
inside almost every application and people don't think about it yeah and yeah
domain initially just did that.
Yeah and that was written in initially in traditional or traditional i think
i'm not even sure if that part predates futures or not i'd have to look it up
i don't know but it It was initially definitely implemented as handwritten futures, which was quite fun.
If anyone hasn't experienced the joy of handwriting futures, I highly recommend it.
Did you manage to clean that up later on and completely get rid of that and
literally lean into the futures ecosystem that came later?
I think a lot of the new code is indeed just async functions.
The only problem is that the code does not use async functions in traits.
That is part of the new API work, which I think the networking code is largely incomplete right now.
Yeah, but it also predates async traits. Yeah.
And it would also be a breaking change now, would it?
That's fine because we declared it experimental.
Yeah, the entire feature is wrapped under...
So we did a little trick where we said, we define experimental features.
They have feature flags, unstable dash something, and changing things breakingly
in there is not a breaking change for the library.
To take a quick step back, we're still somewhat around the 2017-18 timeline.
What other things did you develop in Rust during that time?
So in, I think sort of mid-2017, we started looking at routing security,
RPKI, which was a not
really new i think that started in 2012 or so but it was that was the point
where it started to seriously take off and at that point so for up for rpki
you need something that's called a validator that is basically a thing that
collects all of the rpki information that is out there from various distributed repositories what.
Does rpki stand for.
It stands for resource public key infrastructure and resource here in this is
internet resources or ip address prefixes or as numbers and the idea is basically
to make signed verifiable statements for these resources,
That's also why you need a validator, because someone has to collect all of
this information and validate it and produce validated data sets out of it.
And at that point, there was an initial prototype implementation in,
I want to say, Python that was done during the sanitization of the protocols.
And there was a Java project that was built by the RIPE NCC,
which is one of the people who issue these resources.
And they just made an implementation to sort of further the deployment of this technology.
And because this was a Java thing, the memory consumption of it was pretty terrible.
So it would regularly fall over running out of four gigabytes of memory.
So there was a need for an implementation that is more minimal and can be run on smaller hardware.
And that's what became Routinator, which we started in 2017, indeed.
And because of the lack of alternatives when we released it in 2018 it went
into production and widely used in production fairly quickly that.
Means you must have been one of the first rost in production users for real.
I would assume so yes because it was also like now we have a market share of
about 70 or something like that and i think that went fairly quickly there was
not too much later also an implementation goal which has since been abandoned
and there is now also a C implementation by the OpenBSD people,
but Routinator is still the most popular API validators.
So the experience of indeed having this very early Rust project in production
was quite interesting, because I expected a lot of pushback from people,
from saying, you just wrote this in this fancy new language, and that can't be good.
But there was none.
Some people were a bit skeptical or had problems with packaging it.
So there was an interest in packaging it for Debian and these sort of things,
which at that point was still very tricky.
So there was a sort of a bit of complaints from that side but interestingly
nobody really was bothered with having to install rust up and rust and then
just compile it from scratch,
which we which required at that point we put a
lot of effort into making the build system or not
putting anything in there that would complicate building so we
wouldn't rely on open ssl and these things because that was was and
is a pain so like it literally built with gargo install which i think helped
but yeah i'm really surprised i was really surprised and which also is part
of why we then later found it easy to say yeah let's just do everything in rust
and that there was there was basically no pushback.
Unfortunately we still support open ssl and other rust projects and.
Yeah let's not talk about the rust crypto story there's.
Plenty to say yeah.
Okay let's not go there but was that already written in async rust or was it
still synchrost in 2018.
That's sort of when the entire tokio ecosystem evolved. That's why I'm asking.
That is still at its heart sync.
So the initial version also didn't really need async because the way that you
collected the data back then was by way of rsync,
which means that what we basically did and still do is we just spawn your system's
rsync process to collect the data for that.
Then I was also in an HTTP-based protocol to do this, which we added later.
That one basically is base64 encoded
data in an xml wrapper which is really interesting and
because at that time and i yeah at
that time there was no xml parser that you could use
with async i think quick xml can
now do it basically what i was then using was
request in blocking mode because i didn't want to read the entire file into
memory and then parse it because so i basically just build it that way and that
sort of informed that RouterNager uses a thread pool with a blocking HTTP.
And it also blocks until the RSync process, if you're using that, until that comes back.
Yeah, in my mind, I don't really hate that idea.
It's certainly predictable, or you have a very predictable profile of the application at runtime.
That's a thing that, for example, in asyncrust is a bit tricky to do.
Sometimes it's not very easy to say how asyncrust code will behave in production.
Yeah, it's kind of interesting. So obviously it has an HTTP server now,
and that code, all of that is in async.
So that is just like tokio with Hyper.
But I do think there are plenty of good use cases where you don't strictly need
an entire async runtime.
You can get away with having a few threads, and often that will,
depending on your application, sometimes that can be the simpler and easier to manage model.
But we did sort of i guess start using async more and more i think especially
with the domain stuff where networking was a big concern and because it's a
library you don't want to enforce our restrictions on the users and so we did
have a lot of support in there for async things do you think.
People reach for async rust too early both in regard to their rust learning
experience And in regard to using it in production, you alluded to cases where
a thread pull is good enough.
I think it's really hard to say. Asyncrust gives you the flexibility of deciding
what your threading model is going to be later.
So you can just write code that does not assume how many threads you will have
or what your setup is going to be.
And you can come to a more concrete decision later.
And if you're going to rely on entirely synchronous code, then you're baking
in an assumption quite early.
So if you're working in a model where, if you're working in sort of a problem
space where you already know what you have to deal with.
That's somewhere where a completely sync system can be sort of less of a gamble.
For example, we have one of the products we had to build, one of the things
we had to build while developing Cascade was a signage, well,
a compatibility bridge to help support hardware security modules.
Because for DNSX signers, A large number of operators will often use hardware
security modules, which are just bigger versions, essentially,
of things like YubiKeys to do signing for them.
And we need to communicate with these, but there's older protocols that we wanted to support.
And so we ended up building a compatibility bridge for this.
And that's one place where the application is simple enough and we have a good
enough understanding of what our needs are that it does not need to be async.
And a large part of it does just use a manual threadbook.
Because one thing I'd like to quickly point out was that you both learned Rust
at a different point in time.
Martin learning it around 2015 when async Rust wasn't a thing. So did I and Arya.
You learned Rust when async was already very well established.
And I wonder if that shapes the way you think in Rust.
To me, it always still feels like an addition or tacked onto the language.
Right, that's interesting. to hear you both think about that for a bit and and
what are your thoughts on this is that true for you too or like how do you think
about async rost is it an extension of rost or is it the same language really.
I think now that we
have async functions it feels more like
really part of the language it felt more
more sort of well more library-ish with
with futures where basically you had these couple of traits and
then like runtimes that were implementing stuff
i think now with with async functions it's more integrated into there's still
there's still a like a divide that is definitely there and i think that probably
also always will be there because these things are just different but i think
it feels more more integral to the language now Arya do you.
Also feel that divide?
So I really enjoy programming language design. So I think sometimes I see,
some features from a lower level perspective than others. I'm also especially
concerned about like the implementation details for some of these.
I've spent a lot of time looking at the trade interface for future,
which has interesting restrictions.
And I think I do view it as a, it's a very specific mode that I will work in
when I'm choosing to write async code.
But I'm I personally really I
think I focus a lot on writing libraries it's where
I feel more comfortable and for libraries you need
to have a better understanding of what you're
trying to do because at the end of the day if you're going to serve up an async
function in a public API you're like it's the user's responsibility to run your
code and there are additional concerns like cancel safety and understanding
what your threading model is,
which async code has to be more aware of.
And so it does feel like writing async code is a distinct mode than writing synchronous code,
because then you have these additional concerns to think about with how is this
code going to be used, and that has different implications for async code.
Interesting because it requires a lot of empathy for the user to make the right
decision and you do make a lot of decisions for the user if you think about
it when you write a library yeah.
Totally i've also recently been thinking about similar issues with for example logging and where,
The way that a library logs data is usually not considered part of its API,
but it is something that is important to its end users.
And that's another case where I think we think about this.
One issue that we actually have that I think is quite interesting over here is panicking.
And Martin, I'm sure you'll have a lot to say about this. But because we're
writing code that is in sort of a mission-critical area,
we really try to avoid having spurious panics because you don't want somebody's
name server to crash, for example.
But it's also been a battle because you can have invariants that are guaranteed
within your program, even within your library.
For example, you might have some type which you've defined has certain invariants and,
For example, you might say, I have a string that is not empty,
right? I can always access the first character.
And now implementing methods on that, for example, trying to access the first
character would then usually return to you an option or panic in some way, right?
And then our dilemma is, do we use .unwrap there or .expect?
Because it's an invariant that our program is guaranteeing to itself,
but Rust does not, the language itself does not know that, and so we still end
up with our code that is technically capable of panicking.
If there's an implementation bug on our end, that's capable of causing panic.
And that's been a really interesting journey for us because I don't think we've
fully settled on an ideal spot yet.
I also don't think there's a good answer there. It's just trade-offs. Yeah.
And the trade-offs would be differentiating between internal and external errors,
so things that are implementation bugs and things that are user-facing issues.
Well, you were talking about this recently with BCDR, Martin,
where you're trying to decide, for example for slicing bytes for indexing byte
slices whether to use unwraps or.
So this comes indeed from
the code panicking in certain places either
because you're unwrap or because you're using slicing or like indexing and we
have had cves because of that because like this is gets this gets fed data from
the network so you like untrusted data and of course it then And panicking from
that untrusted data is kind of bad.
And specifically in the case of Routinator, which collects a known set of data from the internet,
So if you restart it, it will collect the same set of data again,
and therefore it will crash again.
So it's like feeding it data that makes it crash.
Just publishing data that makes it crash is a very efficient way of breaking
routing security or this particular kind of routing security for a lot of people,
because Routinator also has a very high market share.
So if you can trigger this, it will crash in a lot of places.
So obviously, we kind of want to avoid that.
So we want to avoid panic in that code which then
means you want to use clippy for this so you want to ban indexing
you want to ban unwrapping but then there is obviously because you are dealing
with bite slices so you will have to do some sort of sub slicing and taking
characters out or bites out of it at certain positions that just has to happen
and finding a way that is both expressive,
like you you mark this as i looked at this this is fine
which we're now i think we're going to do with basically allowing
that particular clippy lint at a particular position and sticking
a safety comment on it i think this is what i'm now settled on
it on for now but that makes the code more cluttered like it would be nicer
to have a more compact way to express this this i've checked for this ideally
that would be something that the compiler can also help with but obviously that
is super hard because this is a runtime thing true.
And at the same time for the people who might say
oh yeah rust also has panics and c it's no better than c or whatever we had
before just keep in mind that this is not a segmentation like it is it is not
leaking information it might be a fault yes it might panic true but also it's
not really expose exactly,
rust guarantees that your data is still safe and the memory safety is still upheld even if you panic.
This is also an interesting point about like memory memory safety isn't everything,
like even if you can yes you can make sure that
like buffer overflows and that sort of stuff doesn't happen but a a denial of
service problem might still be there yeah that's something that people quite
often overlook when they say yeah but it's it's fine because it's memory safe
but that's not all the issues that exist yeah.
Panic safety is i guess the umbrella term for these issues um which.
I guess comes back to what you said earlier where people were sort of looking
at this one sec fault somewhere and then and then pointing at that that's sort
of the consequence of putting memory safety and these sort of safety aspects
center of your language that then of course people will be more critical about
when that actually doesn't seem to hold.
It was even sort of a good thing that the segfault happened in that case because
it was in an unsafe section that dealt with i think a data structure like a
linked list and that would have caused a memory safety issue if it didn't panic at this point yeah.
It's always linked lists.
I i don't remember what actually happened in that specific case whether it was
was it a segfault But this is certainly an experience that we had that we didn't
think of when we started with this project that panics are or can be a problem.
And panic safety is a thing that you need to think of when you're designing
things, when you're writing code.
Throughout the conversation, we talked about async rust a lot.
And we looked at it from various perspectives. What are the common traits that
have evolved around asynchronous usage in your projects?
And I assume you mean trait, not in the rest sense.
Precisely, yeah.
So one thing that we've really noticed that's been interesting to explore is
how we use async for daemon processes.
Because a lot of our work is in these long-running daemons, where you sort of
inevitably end up with high-level state machines, right? Right.
Thinking about, for example, right now we've been developing Cascade and Cascade
has to worry about, well, it controls multiple DNS zones and then each zone
can be moving through multiple states.
Right. It could be loading data. It could be signing data. It could be then serving that data.
And we often run into the case of, oh, are we just building these state machines?
Isn't that what async is supposed to help us avoid? And so we inevitably ask
ourselves, can these high-level perspectives of what our program is doing be
expressed as async functions?
Can we just write our entire application as one big async function?
And across four or five projects, we've always eventually drifted to the answer being no.
That it's always better to describe our code at the highest level in very explicit
ways, even if that sometimes involves writing out state machines yourself.
And, you know, writing state machines is not fun in any programming language.
And that's been interesting to, it's been interesting to notice when async feels
more appropriate and when async feels less appropriate.
The sort of rule of thumb that we've reached is that if,
you need to perform some action for a short period of time, right?
So it's not like a state that your program is in, but just an action is doing temporarily.
Then async is usually a very good fit. So if you're, for example,
fetching some data from the network, so you're going to perform some HTTP requests,
that might be a perfectly fine time to use async code.
But in order to describe what the state of your application is,
like the highest level states that a user might care about, So we try to avoid
using async in those places.
The main reason for this is that when you implement something as an async function,
it's an opaque type, right?
We know that there's a state machine underneath this, but you can't examine
any of the details about that state machine.
And examining details, having that degree of transparency, is really important
when you're trying to build reliable software.
So for Cascade, we try to offer a lot of ability to inspect what Cascade is
doing at any time to see what state these important zones are in.
And it's really hard to extract that information when you just have this opaque future object.
The downside of this approach, I mean, it helps us make things more explicit.
It helps us better understand what our state machines are. But the downside
of this approach is that there's a lot of tooling for async functions that we can't use.
For example, if we need to perform certain actions after a period of time, for example,
when we sign DNS data in Cascade, we need to re-sign that periodically because
those signatures expire, they have expiration dates embedded in them.
We can't use tokio timers to express all of those expirations when we need to do more stuff.
Again, because we need a greater degree of transparency.
We want to be able to inspect what those timers were. And tokio doesn't have
any way to inspect what timers are scheduled.
There's no sort of good way to even try to do that.
And also, because our high-level state machine is not an async function,
we don't have a single good place where we could use these timers.
We also need to, for example, persist those timers across restarts.
And that's also, there's just no way to do that with a tokio timer.
So we've slowly sort of started building up our own, like, not async tooling,
but sort of this explicit state machine-y tooling that helps us deal with these problems.
Often very strongly inspired by whatever you'll find in tokio or other async utility libraries.
That's been a really interesting shift. We've also noticed that happening across
all of our projects, right?
I don't think any of our projects use, yeah, none of them use a top-level async
function to describe what they're doing.
They're all these long-running daemon processes. And so all of them end up with
some form of state machine at the very top.
That's been a really interesting experience. It's not something I've really
noticed anybody talking about as well.
That's been, I wonder how other daemons written in Draster dealing with this sync-async divide.
Would you say you use structured concurrency a lot for the async features?
That's an interesting question.
I think we do use structure,
Okay, I don't think we use it significantly. Most of the time when we are actually
dealing with async functions where structured concurrency isn't necessary,
like where it's actually applicable,
we're usually doing very few things because these async functions happen at
the very edges of our application and they don't tend to have that much to do.
But I think there would definitely be use cases where we'd see more use of it.
And it's helpful in situations where you have a bunch of IO operations and you
can sort of run all of them concurrently and then cross the boundary to, say,
another part of the application, which might or might not be async.
So this way you can have a clear separation between the different tasks that
you're handling within your application.
So the way that we've dealt with this a lot in Cascade, at least,
which is what I've been working on, is whenever we have some action that needs
to happen in the background, per se, we will spawn up a tokio task for it.
And that results in the tokio join handle object, right? And that lets you rejoin this task whenever.
And we just save that within our state machines.
So that's a very explicit way for us to track that. Oh, yeah,
we have that background task going on.
There's only one such background task going on. And if we ever need to,
we can take that out, inspect it, or just wait on it.
But usually what we then see happen is at the end of that tokio task,
it would unlock that state machine itself and then go make those modifications.
So it will sort of clean up after itself.
And because of this model, there are relatively few times where we have multiple
operations going on at the same time. But Cascade's also not that IO heavy.
Yeah, so we have a different project, Krill, which is the CA for this RPKI world.
Basically a CA implementation. And that one is more input-driven.
So it's more user gives command. Command leads to certain things happening.
And the user gives command thing is an HTTP server. So this comes in as like a REST API.
And there we indeed use... So it used to be more like it was async,
but a lot of the code was written as if it were sync.
So there's some slightly dubious things in there, like long-running CPU
tasks that are actually within or long-running CPU
operations that are actually within doki tasks which is
tricky so I've just rewritten that or
am in the process of rewriting this which was quite the adventure of doing
exactly what you said so basically the HTTP server is
async and whenever it has processed
the input has verified the input like that authentication or
all of these things it will then queue a
a task for a sync core
that runs on a thread pool and then just
like just then what a threads or whatever however you have would
you and process these things yeah and i think
that is a really good model for this because it gives you yeah more and more
visibility into what is happening rather than having lots of tasks and you don't
really know how many tasks you have and all of these commands coming in from
all over the place and like you have sort of like like you get back pressure
and all of these things you get kind of for free which is nice yeah.
Yeah it's a thing that also i
haven't heard many people talk about which is using
channels to decouple parts of your application more
and really dealing with back pressure and introspection and so on it feels like
when you run long running processes you need that level of introspection you
need the level of control and maybe literally just using tokio tracing is not
enough you you want to control it more on a on a,
On a language level or on a runtime level, so to say.
What was interesting was that I had to implement my own thread pool and all
of this communication myself.
Because you have these queues and you have all sorts of synchronization features.
There aren't really good bridging between sync and async. You can't use Rayon
because that doesn't do any async things.
On the plus side, it turned out that doing this was surprisingly easy.
The tools, the components for this are there.
What was great fun, in inverted commas, was the shutdown code,
to cleanly shut down. You have to collect all of these.
You have to collect all the tasks. You have to collect all the threads.
You have to tell everyone that we're now shutting down.
Getting that right was quite interesting to do.
Is that open source somewhere?
That is definitely open source, yeah. Yeah. okay what's
the project name we can link to it in the show yeah it's krell as
in not the fish not the fish not the
sea creature i get
yelled at now because the colleague who originally wrote this is is a
biologist by trade so he will yell at me now if he
hears this podcast yeah and that all that
code is now there's a new branch called full sync where where
i've given that's currently in review because
it's huge yeah you've been you've been toiling away at that for a while yeah
so i did a lot of experiments i also tried to do because one thing that this
has is there's a bit where it has to go out and do http requests as a client
so i originally didn't want this code as purely sync because then you're blocking
the thread until the http request comes back,
which if that happens too much then you're blocking all the threads and that's kind of not nice,
But all of these requests are to trusted servers, so I think it's fine to do it this way.
But initially, I tried to build a thing where it could go sync,
async, sync, async, sort of, like
switch, jump between different modes and have tasks that sort of jump.
Because these then have to be sent in static. You need to clone a lot of stuff
because it needs to own all the things that it has.
Getting this jumping was extremely hard, and I basically never got it right.
So eventually I just decided, you know what, this is just not ever going to work.
It's going to be difficult to reason about and to maintain. So let's just do
the sync and maybe look into move those tasks that do these HTTP requests.
Because it's not all of them, it's just like a few of them.
Maybe have a separate thread pool for them so that they're only blocking that
bit and you can have the regular thread pool for four tasks that don't need
that. So you have to look into that as well.
But for now, it's basically just, we just basically shrug and say,
yeah, it's probably going to be fine.
Yeah, I'm not sure if we can dig too much into that.
But what I found was that if you have tasks that are too different in nature
inside of your application, and you don't clearly separate those,
you're sort of asking for trouble.
So you kind of start to work against the framework a little bit.
Yeah.
Because the tests, they have different runtime requirements,
they have different memory usage, different performance profiles, and so on.
It's hard to manage interactions between them.
Yeah, also my experience has been that every time you say, yeah,
it's probably going to be fine, it definitely isn't going to be.
So I should probably look into this.
But yeah, I think overall async as a feature has been really nice for us.
It's been really helpful for getting a lot of this down. And as I said,
it lets you be really flexible about what your threading model is going to be.
And I think that's really cool.
Yeah, and writing the TCP or HTTP connection handler or request handler as an
async function, that's great. That's fantastic.
Yeah. It makes it so much easier, it makes it so much clearer,
much easier to understand what's going on, what you're doing,
and then basically do that and do a queue to bridge into like an event or like a sync core.
I think that's a great way of doing these things.
As we're closing out, the final question traditionally is always,
what's your message to the Rust community?
I think in general, I don't know, like, I think programming is fun.
If I had one thing to say, it'd be, make sure you have fun.
But for Rust, for the Rust users specifically, I think,
I don't know, Rust is such a good language and there's so much fun to be had
with, as I mentioned before, like, I think Rust's power comes from being able
to design really elegant and really intricate APIs.
That let you get precisely the point across that you want to.
And we've had a lot of fun doing this in Domain, for example.
We've also, one of our colleagues here, Therets has been working on a scripting
language, which integrates really tightly with Rust.
And that's another place where we've had really interesting API discussions
about how to make this stuff work. It's called Roto.
And it's been fascinating to
try to see where can we sort of you you're trying to make this really delicate
sculpture and figuring out where can i draw the lines here is something i think
rust is really good at so i think make fun apis use the type system to your
advantage have fun with it.
Keep doing what you do and stay where you are because a lot of what makes rust
great is the people who create rust who work on rust But the attitude,
the way the community, the welcomeness, the openness of the community,
I think, is a huge part in why Rust became a staple and not just was a fad.
And I think if we just keep doing that, then we will have fun. Yeah.
Well, I certainly had fun and I can get behind those statements.
Yeah.
And for me, all that is left to say is thank you, Arya and Martin,
for taking the time for the interview today.
Thank you very much. It was great.
Thank you.
Rust in Production is a podcast by Corrode. It is hosted by me,
Matthias Endler, and produced by Simon Brüggen.
For show notes, transcripts, and to learn more about how we can help your company
make the most of Rust, visit corrode.dev.
Thanks for listening to Rust in Production.
Arya
00:00:21
Martin
00:00:45
Matthias
00:01:15
Arya
00:01:32
Matthias
00:01:32
Martin
00:01:44
Arya
00:03:07
Matthias
00:03:49
Arya
00:03:56
Matthias
00:04:06
Martin
00:04:08
Arya
00:05:02
Martin
00:05:04
Arya
00:05:06
Matthias
00:05:19
Martin
00:05:30
Arya
00:05:46
Matthias
00:06:06
Martin
00:06:46
Arya
00:07:13
Matthias
00:07:48
Martin
00:07:55
Arya
00:08:17
Matthias
00:09:46
Martin
00:09:51
Matthias
00:10:43
Martin
00:11:19
Matthias
00:12:10
Martin
00:12:27
Matthias
00:13:25
Martin
00:13:30
Matthias
00:14:06
Martin
00:14:21
Matthias
00:14:44
Martin
00:14:50
Arya
00:15:03
Matthias
00:15:06
Martin
00:15:33
Matthias
00:16:07
Martin
00:16:09
Matthias
00:16:38
Martin
00:16:51
Matthias
00:16:54
Martin
00:16:55
Matthias
00:17:44
Martin
00:17:50
Matthias
00:17:51
Arya
00:18:04
Matthias
00:18:39
Martin
00:19:12
Arya
00:19:28
Matthias
00:20:14
Arya
00:20:17
Matthias
00:21:27
Arya
00:21:48
Matthias
00:23:31
Arya
00:23:39
Martin
00:23:41
Matthias
00:24:46
Martin
00:25:55
Matthias
00:26:44
Martin
00:26:46
Arya
00:26:56
Martin
00:27:02
Arya
00:27:03
Matthias
00:27:05
Arya
00:27:06
Martin
00:27:09
Arya
00:27:11
Matthias
00:27:45
Martin
00:28:06
Arya
00:29:29
Matthias
00:31:05
Arya
00:31:47
Martin
00:33:28
Arya
00:34:28
Martin
00:34:32
Arya
00:34:50
Martin
00:36:02
Matthias
00:36:06
Arya
00:36:44
Matthias
00:36:45
Arya
00:36:49
Matthias
00:40:37
Arya
00:40:45
Matthias
00:40:47
Arya
00:40:59
Matthias
00:41:28
Arya
00:41:35
Martin
00:41:37
Arya
00:41:44
Matthias
00:41:52
Arya
00:41:55
Martin
00:41:55
Arya
00:41:58
Martin
00:42:00
Matthias
00:42:07
Martin
00:42:09
Arya
00:42:33
Martin
00:42:35
Arya
00:42:49
Matthias
00:43:40
Martin
00:44:06
Arya
00:44:31
Martin
00:44:43
Matthias
00:45:02
Martin
00:45:17
Arya
00:45:21
Martin
00:45:34
Matthias
00:45:37
Martin
00:45:40
Arya
00:45:43
Martin
00:45:45
Matthias
00:45:59
Martin
00:46:11
Matthias
00:46:42
Martin
00:46:43
Matthias
00:48:11
Martin
00:48:14
Arya
00:49:44
Martin
00:49:48
Arya
00:49:54
Matthias
00:49:55
Martin
00:50:06
Matthias
00:51:19
Martin
00:51:41
Arya
00:51:53
Matthias
00:52:30
Arya
00:52:46
Matthias
00:54:19
Martin
00:55:00
Matthias
00:55:36
Arya
00:55:38
Matthias
00:57:05
Arya
00:57:18
Martin
00:59:11
Matthias
00:59:17
Arya
00:59:29
Martin
00:59:42
Matthias
01:01:41
Martin
01:02:11
Arya
01:02:37
Martin
01:02:43
Matthias
01:03:02
Martin
01:03:18
Arya
01:03:22
Matthias
01:03:41
Arya
01:03:59
Matthias
01:04:04
Arya
01:04:05
Matthias
01:09:05
Arya
01:09:11
Matthias
01:09:47
Arya
01:10:12
Martin
01:11:13
Matthias
01:12:49
Martin
01:13:25
Matthias
01:14:12
Martin
01:14:13
Matthias
01:16:10
Martin
01:16:31
Matthias
01:16:31
Arya
01:16:38
Martin
01:16:41
Arya
01:16:50
Martin
01:17:05
Matthias
01:17:33
Arya
01:17:42
Martin
01:18:58
Matthias
01:19:27
Martin
01:19:33
Matthias
01:19:34
Arya
01:19:41
Martin
01:19:44
Matthias
01:19:45