Rust in Production

Matthias Endler

Apollo with Nicolas Moutschen

We discuss Rust adoption, Rover CLI, testing, resources for Rust developers, technical aspects, crate challenges, and the inclusive Rust community.

2024-01-11 60 min

Description & Show Notes

In this episode, Nicolas, a staff software engineer at Apollo GraphQL, discusses the company's use of GraphQL API technologies. Apollo GraphQL specializes in open-source libraries for both client and server-side applications, with a focus on integrating Rust into their main offerings: the Apollo router and GraphOS cloud. Nicolas explains how the Apollo router consolidates multiple microservices into a single API, efficiently routing requests to appropriate services.

He delves into GraphQL's role as an effective query language for APIs, highlighting its ability to provide a comprehensive description of API data and its compatibility with existing data systems. The shift from the JavaScript-based Apollo Gateway to the Rust-based Apollo Router is a key topic, with an emphasis on the performance and safety improvements Rust brings to the table.

The conversation covers the use of Rust for the router and GraphQL parser, alongside Kotlin for the management plane and GraphQL for the API. Challenges in stability and reliability are discussed, as well as Rust's advantages in safety and type system consistency. Nicolas shares insights on Async Rust, particularly its impact on productivity and application in CLI tools like Rover.

The episode also addresses learning Rust in stages, from basic language concepts to advanced internal mechanisms. It touches on functional patterns in Rust and strategies for effective dependency management. Closing the discussion, Nicolas highlights the inclusive and supportive nature of the Rust community.

GraphQL is at the core of companies like GitHub, trivago, and Facebook. In this episode, Nicolas, a staff software engineer at Apollo GraphQL, discusses the company's products and how they use Rust in the core of their GraphQL engine: the Apollo router.

About Apollo
Apollo is the industry-standard GraphQL implementation, providing the data graph layer that connects modern apps to the cloud. Apollo is the company behind the open-source GraphQL platform that helps developers build and ship apps faster with open source tools and a cloud service.

About Nicolas Moutschen
Nicolas Moutschen is a Staff Software Engineer at Apollo. He is a Rust enthusiast and has been using Rust for years at Apollo and at AWS where he worked on the serverless infrastructure. He writes about Rust on his blog n14n.dev (https://n14n.dev/).

Links
- Apollo Router - High-Performance Federation Runtime Announcement: https://www.apollographql.com/blog/apollo-router-our-new-high-performance-federation-runtime-is-now-available-in-open-preview
- Learn more about Apollo: https://www.apollographql.com/
- Apollo on Twitter: https://twitter.com/apollographql
- Nicolas Moutschen on Twitter: https://twitter.com/NMoutschen
- Nicolas Moutschen on LinkedIn: https://www.linkedin.com/in/nmoutschen/
- Nicolas Moutschen: https://n14n.dev/

Transcript

Matthias
00:00:19
Nicolas, can you say a few words about yourself and the company Apollo that you work for?
Nicolas
00:00:26
Yeah, so first of all, Matthias, thank you very much for having me today. So I am a staff software engineer at Apollo GraphQL. Just before we get started, just a quick introduction on Apollo GraphQL. We're a company that specializes in GraphQL API technologies, as the name indicates. So we have a bunch of quite well-known open-source libraries for clients to reach out to GraphQL APIs, but also server-side technologies. And what's probably more interesting for this podcast about Rust is we have two main things that are mainly written in Rust. One is the Apollo router, depending on what you prefer. So the idea with the router is that when you start to have multiple microservices that all are using GraphQL for the API technology, you want to combine that into a single API that will group all of them in one go. And so what the router will do is it will take an incoming request and be able to redirect the traffic to the right microservices. And in my case, I'm part of the Apollo GraphOS cloud team, and we are responsible for manager offering of the router. So we host this on behalf of our customers, and then they can then send the traffic to their own microservices from there.
Matthias
00:01:45
I guess a lot of listeners already have heard of GraphQL, but maybe they don't know the specifics. Maybe you can give us a quick primer on what it is and how it's different from, say, REST.
Nicolas
00:01:57
Sure thing. So with GraphQL APIs, you define a schema that's basically your entities, your objects, and the fields that you have on. So you might have, for example, products with names, description, pricing, and so on. And one thing that's very different compared to REST APIs is that the client's able to say exactly which fields, So with REST APIs, you often have problem with either under-fetching or over-fetching. So over-fetching, you're fetching too much data compared to what you need. Under-fetching, you don't fetch enough, and then you need to make subsequent API calls. With GraphQL, you're actually able to make a single call for all the data you need by specifying exactly which information are useful for you. In the case of federation, which is important for the Apollo router, the idea there is that we can combine multiple microservices apis into a single one and then from your client you can make one call that will actually go to all these different microservices and bring you a single response back
Matthias
00:02:57
when you explained that i thought wow this is a massive problem it sounds like a hard GraphQL problem maybe an np complete problem even and you make something very complicated very simple or very easy for front ends do you agree and also if you you agree what is your perspective on this and how complex is it in the back end
Nicolas
00:03:21
so yeah yes i really agree with this especially on the on the front end stuff it's really nice to just be able to make one query you don't have to think about the back-end infrastructure you don't have to think about how many microservices and all the different turnpoints just make one query fetch data you need and indeed the routing part is it's a solved problem right i mean otherwise the apollo which would not exist. But this is still a relatively hard problem to decide was the correct order operation. So we call that making a query plan. So we plan what the queries we need to make to each microservices on the backend. We call them subgraphs. And then that federated API is called a supergraph. And then, you know, making all the queries in an optimized way to ensure that you get a response as fast as possible and exactly with the data that you need and nothing more.
Matthias
00:04:12
When you started with GraphQL at Apollo, was Rust already that mature that you could start with Rust or did you start with a different language and evolved from there?
Nicolas
00:04:24
So to give a bit of history on Apollo itself, I mean, Apollo is over 10 years old, right? So our adventure with Rust is relatively recent. It's a question of a few years now. But also, a lot of what Apollo brings to the table and a lot of what people know about Apollo today is technologies that are specific to certain environments. So on the client side, we have JavaScript libraries, libraries for iOS, for Android. And then on the server side, we have the Apollo server, which is in JavaScript. The first version of a federated API we made, it's called the Apollo Gateway, was actually written in JavaScript. So the Apollo router is mostly a rewrite of this, but into Rust, which was really made for performance reasons at first. The main reason why we started on this journey was to deliver the kind of performance that you would expect from an API at scale that's fronting all your traffic.
Matthias
00:05:32
It feels like you're hinting towards... Other advantages of Rust other than performance in the long run?
Nicolas
00:05:39
Yes, there is. So in the two use cases we have, where we are heavily using Rust today at Apollo, there's one where performance is really critical. There's a second one where safety is very critical, right? So on the cloud side of things, every time we manage infrastructure, we update resources on behalf of customers, we basically have distributed transactions. Right safety, order of operation, execution, making sure that everything is happening in the right order with the right information, that nothing is lost, is very important in this context. So I would say these are the kind of main two drivers for that. A third thing that's important too, GraphQL is a strongly typed form of API and Rust is a strongly typed language. So these two are actually quite interesting together as well to make sure that we align from that perspective.
Matthias
00:06:37
It feels like Rust in general aligns with this problem really well because if I can just quickly touch on what you said, maybe stability is also a core part of that because the least thing you want to go down is your API. It always needs to be up. And I don't know if you have any SLAs or something around this, but probably customers expect uptime. and for that you need stability in your code base and this is where Rust can help, right?
Nicolas
00:07:05
Yeah, this is certainly a very important thing for us if I'm thinking about our managed service, right? So our DrivePress cloud offerings, our managed cloud router, this is something we hear a lot from customers around SLAs and reliability. And the most important thing in terms of maintaining your SLA is not necessarily the underlying infrastructure. I mean, you need to build something on top of solid foundation, on top of solid cloud infrastructure that's going to work no matter what. But... Most of outages come from application-level changes, right? Not necessarily the underlying infrastructure. So changes are risky, and asking people to trust us with all that traffic being the API, the front-end API that's going to ingest all the traffic and then route it to their microservices is a huge responsibility. Safety and reliability is extremely important for us for that reason.
Matthias
00:08:12
On top of that if the application code causes most outages i think this is a good space to be in because your customers make changes it's very normal that from time to time they deploy something which maybe isn't perfect but the stability of the platform itself is always guaranteed so at the risk of getting ahead of ourselves i was really curious about on call as well because did Did you see a change when you moved to Rust, when you transitioned from JavaScript, say, to Rust for your router? Did you see an improvement with regards to on-call and outages?
Nicolas
00:08:53
So there's a few different things here. In terms of the timelines, it's kind of hard because we really started our cloud journey when we already had the Apollo router, so we already went into our Rust journey. But now, if I kind of take a step back and I'm thinking about what Rust made us think about as we were starting. So we started the cloud journey about a year ago, right? A bit over a year ago now. When we started on this cloud journey and all of the control planes, so the control plane is the part that manages all the customer resources, so all the managed routers and so on, this is also all written in Rust. And Rust makes you think about safety a lot, even though the language itself is designed for safety. So there are lots of ways to not shoot yourself in the foot just because you're already using Rust. So all the safety guarantees around mutability around access to data and so on that are built into the language are great but it also shaped the way you think and then you start to think way more in term of safety features that are important but that aren't necessarily built into the language but language makes it easy to then implement them so i mentioned for example right ride safety being extremely important for us. So to create a cloud router for a customer, we need to provision a bunch of different resources. And we need to avoid creating the same thing twice. We need to keep track of where everything is. And so when we are making changes on our own code base, we are making all these changes there, we need to make sure that we are not accidentally breaking things. So we put a lot of fun phases on the ride safety safety and making sure that the state of our databases, the state of what we know exists for in terms of customer resources, was always correct. And so we've encapsulated a lot of write safety into the struct and variants that we use in Rust. So we actually get, by using the strong typing of Rust and the way we structure that, we actually get a lot of safety guarantee from that perspective, which is not something that that Rust enforces, but Rust made us think about.
Matthias
00:11:11
Does that mean you would get a lot of errors at compile time that you would otherwise get at runtime?
Nicolas
00:11:17
Yeah, exactly. We're trying to push a lot of these, kind of check at compile time as much as possible. But it's also just a thing that... Okay, let's talk a bit about domain model and DDD, right? And we had this question at some point, hey, should we have semantic types or should we have safe types? because these are not always the same. So I'm going to take back my example of a product from earlier. The idea of you have a product, it has a name, description, price, and so on. Well, a product might have a state. Is it for sale or not? And if your product is not for sale, it should not have a price. So semantic representation, you might just represent them, all these fields as next to each other. You would have a status type for your product. you would have a price field, you know, you would have a status field, a price field, and so on. But what you can do is actually encapsulate the price of your product in the state, which is not semantic, you know, what's the relationship between a status and a price. But now it's safe, because you only have a price if this product is for sale. That's just one example right here, which is, but we have these kind of things. So if a Cloud Router workload is deployed in a certain state, then it must have this information, for example.
Matthias
00:12:40
How would that be encoded in the type system? I imagine maybe you have something like a, normal product and then you have a product for sale or something you have two different types for for guaranteeing that you have a price for one and not for the other or is it completely different
Nicolas
00:12:56
oh no so so the way you would do that is you would have a product struct right but then in this product status type you would have two that that would be an enum with a not for sale and then for sale but for sale would contain a field with the price for example
Matthias
00:13:17
ah okay so it's a very rich type with an enum yeah to encapsulate that and on the front and when you retrieve that you probably don't even have to feel available so you can't even query it or would you get a null
Nicolas
00:13:26
yeah in that case because then it's it's matter of how different languages think about the type system them right what we would typically do for the simplicity of the front end would just be yeah the price would be null if the product is not for sale so you would then remap it into GraphQL native objects
Matthias
00:13:45
which brings me to my next question about the interaction between the front end and the back end so I guess on the front end you might have something like TypeScript and a lot of people use TypeScript in combination with Rust and it seems like a good collaboration if I may I say so. Do you agree?
Nicolas
00:14:06
Oh, yeah, no. These are certainly languages that have a lot of things in common. So even if you look at strongly typed languages, not all of them have exactly the same features and the same things. There are certainly things that don't necessarily map exactly one to one, and you have to do a bit of conversion between the two if you think about the type system. And I'm just addressing to people that might be learning Rust, for example, and you might come from typescript you already know a lot if you understand the typescript type system you already understand a lot of how rust will behave right right yeah i'm not a front-end person and my experience with typescript is really great i cannot complain i think it makes a lot of those little rough edges in javascript much much easier to handle yes yes totally it's i mean, I'll be honest I am very spoiled with Rust I used to be mostly a Python dev before I started really doing Rust seriously I've been doing Rust for a bit for three years now. So this was still at the beginning of Python having actual type hints and these kind of things. Nowadays, I've looked, and there's so much clever stuff you can do in Python, but still rely on tools to tell you, like, hey, this is right, this is wrong. But Rust's type system has really spoiled me, and it's really hard for me to think about going to a language that doesn't have any kind kind of strong type anymore because it's just so easy in terms of cognitive load i don't have to think about and then as i mentioned earlier you can encapsulate so much clever stuff in types just for your own safety or to match whatever need you have for example yeah
Matthias
00:15:57
the other thing i briefly wanted to touch on was the control plane is it so that you deploy one completely completely independent instance for each customer or do you have shared infrastructure as well for all the customers maybe to make it a bit more price efficient or where do you find the balance here
Nicolas
00:16:22
yeah so it depends so we have two tiers at the moment we have the first one we started one year ago which is the serverless tier which use way more shared infrastructure infrastructure. We still have strong isolation of workloads and so on. But then we have our dedicated tier, which we announced in private preview a few weeks ago. That tier then has way more dedicated resources. I mean, that's the point of that tier is to actually give you dedicated resources only assigned to you. There's still going to be some level, if you think about about cloud providers and so on, there's still going to be some level of shared infrastructure. It doesn't make sense, for example, to have one AWS account per workload or one VPC per workload. There are things you can do to save on the number of resources you have to manage, because every time you add more resources, that's a lot of things to manage in the end and make sure that everything is working fine. But then when we talk about the underlying resources that are actually running that workload, then in that case, they would be dedicated to a single workload.
Matthias
00:17:30
And what about this serverless platform? How does it work? Is it actually using Lambdas or is that an internal name for a platform that you don't have to maintain yourself?
Nicolas
00:17:42
So serverless there is more in terms of how is the customer experience, right? So it's not using Lambda functions under the hood. But it's based on the pricing model where we are closer to a price-per-request model like you would have with serverless services like Lambda. While the dedicated tier, you pay for provision resources that are always available for you when you have predictable performance, basically.
Matthias
00:18:07
I see. So to recap, the router is in Rust. Probably parts of the GraphQL parser is in Rust, I assume. GraphOS is proprietary, but it is also written in Rust.
Nicolas
00:18:23
So this one is, because GraphOS is a bit more than just, there is GraphOS Cloud, which is our managed offering where we run the router for you. But then there is Apollo GraphOS as a whole, which is a management platform for your own resources. For example, let's say I'm a microservice developer. I make an update to my GraphQL API. API, you then publish that change with Apollo GraphOS, and then we will create a new schema for the federated API, so for the composition of all these microservice APIs into one. We need to update the schema and then push it to your router, whether it's self-hosted or managed by us. That part is mostly in Kotlin, actually.
Matthias
00:19:10
And how do you decide which language to use for a new code? Where do you strike the balance here?
Nicolas
00:19:16
So, I mean, Kotlin has been, once again, a lot of it comes to history and what we've been using before. So a lot of the management plane platform was written in Kotlin and is written in Kotlin. And if you make change to this, it doesn't make sense to rewrite it in Rust, because this would be a lot of efforts to rewrite everything in Rust. That's not necessarily a compelling case today. And now when we talk about new pieces of software, so new functionalities and so on. It kind of depends a bit. I've seen both. I've seen new services, new internal services that are then written in Rust because we are at the point now as a company where we have plenty of Rust developers internally. We've built experience. We understand the road ahead in terms of what does it take to run a Rust service and we can just do that. So some new services are going to be written in Rust, but then we might also write some new codes in Kotlin in to to stay consistent because we might need some libraries we've written and so on right so it's it's very much a case-by-case basis even though i'm seeing more
Matthias
00:20:29
and more rust adoption over time another tool that i saw that is written in rust apparently is rover which is a cli tool yes can you quickly say what it is and what you do with it
Nicolas
00:20:46
yeah sure so i gave you this example where I'm a graph developer, right? I have my microservice, I'm changing the API, and I want to publish that change. So Rover is a tool, it's a CLI tool to interact with Apollo Graph REST. I mean, it's more than that because it also contains features to help with local developments and so on. So it's meant to be a one-stop tool for all your GraphQL, Apollo GraphQL-related needs, basically. So you can use that to quick start a new service, a new microservice. You can use that to publish the changes you've just made. So you can integrate that into your CI, CD pipeline whenever you've updated your microservice, then you tell us like, hey, I've made an API change. Another feature we have on our management planes, the ability to do checks to make sure that, okay, you're making a change. Will it actually work? Because if we have two microservices with the same field name, like which one has precedence, which one is the right one. Or there might be conventions, so you might have lane checks and so on. So Rover can help you with all this kind of running these checks in CI, for example.
Matthias
00:21:55
And Rust is pretty great for CLI tools, so probably it was a natural decision to write that in Rust.
Nicolas
00:22:01
Oh, yes, yes. It's a very different experience because the different components... I mean, Rust is a very wide ecosystem. Rust is something that can be used in so many different domains. The way you write Rust for CLI too is very different from how you write Rust for... You know the the control plane services or because then you use a lot of async stuff and so on that you might not need in CLI tool right but yes rust is really great from that and you have all the compatibility that we need to add features quite easily which is nice
Matthias
00:22:33
do you have a situation in the command line tool where you try to access two resources asynchronously somehow i can imagine for For example, you want to look up a schema and maybe you need to fetch information from one resource and one from the other and you could do that concurrently somehow?
Nicolas
00:22:54
I'm not the most familiar with the Rover code base. So my knowledge might be rusty because last time I checked that was not necessarily... I mean, if it was the case, it would be done sequentially. But yes, totally. This is something that you could... I mean, okay, let's take a step back actually because then we get into the ball of GraphQL because, well, all APIs is GraphQL, shockingly enough. So you can also make optimized queries. So instead of having to do two queries, you can make one query that will fetch all the information in a lot of cases, right? So then the question of doing it sequentially is not necessarily the most relevant thing.
Matthias
00:23:35
Yeah. And async Rust is not always something that you want to have in a CLI tool necessarily. It may be not even necessary most of the time. And I'm wondering, what are your experiences with AsyncRust, especially on the control plane, especially on the backend side with routing? Does it really help you solve a problem? Do you see that without async Rust, it would be way harder to build some of these things? Or would you say it's kind of syntactic sugar on top?
Nicolas
00:24:08
So on that end, the asynchronous part simplifies a lot of things. I will actually go back a step and talk about what I've seen in terms of Rust developer experience, and that will be quite important for this, I promise. So what I've found, if you look at a lot of Rust developers' journey, there's kind of three main phases. There's the first phase where you're starting out, you're still a bit confused by the borrow checker and other things from the compiler, and you clone a lot of things and so on. Then there's a point when you start to be comfortable with a lot of these systems and so on, and you might start using AsyncRust and you're able to be quite productive. And then there's a third phase when you start to have mastery over some of the more intricate systems, like in-depth knowledge of how Tokyo works, for example, and how some of the more complex Async stuff work. And so if you're in the second category, if you're people that are able to be proficient in Rust, but you don't necessarily know all the details of the libraries and so on, Async Rust is already a great tool there. Because you're just able to make something that works, it's going to be highly optimized, is going to be able to handle a lot of traffic without you having to put much thought into making it work nicely. Even in the first category, even if you clone a lot of data, it's still working great. There's so many great libraries nowadays that are just available, have a great API. So you can get started in AsyncRust and just already, just by using AsyncRust, you're going to have systems that are going to be very performant when you think about microservices that need to handle potentially thousands of RPS or traffic or something like this. But then when you get to do the third tier and you start to dive really deep into asynchronous, then you can start to do some pretty clever optimizations, some pretty clever thinking in terms of how to optimize your asynchronous communication, to avoid cross-system race conditions, you know, distributed race conditions, to make sure that you actually fetch multiple data systems, multiple pieces of data at the same time, and so on, all in one go. So there's kind of this journey of growth into the kind of Rust async ecosystem, where you already get a lot of benefit just being able to make async calls and having libraries like Axum and so on to just write a server that's already going to be able to handle a lot of traffic without you having to think too hard about it. On the third tier, year how would tokyo or async rust in general prevent race conditions on multiple systems, okay so we had a case like this i'm trying to remember the details i've been i was off last week so so i'm still kind of going back into my work mind but we we had a problem the order of operation with a distributed system was very important because if we were to call it first to fetch the state and then run a certain operation, there might be a timing issue in the very, very, very narrow gap between the two pieces. We're talking microseconds or even less. I mean, more like milliseconds. Network is not that cheap. But we're talking about millisecond gap or sub-millisecond gap here where there might be a race condition where we query the state and then start doing an action. What we could do is actually, if we can start the action first, then query the state. And because we benefit from things being async, we can actually optimize things quite well there. And then we could interrupt the future if we realize there's actually no work to do, for example.
Matthias
00:28:06
Kind of a good reminder that Rust does not prevent you from running into race conditions. Even though the type system is very sophisticated and a lot of things around async Rust is maybe wrapped in a mutex or an arc, doesn't necessarily mean that you can avoid these cases, especially if the system becomes more and more complex, right?
Nicolas
00:28:28
Yeah, yeah, it is. You have to think still about the kind of primitives and what you're actually doing. So Rust will do a lot to already prevent yourself from shooting yourself in the foot. But the most beneficial part, when you get into the second tier, third tier of your Rust journey, it's how Rust changes your mind and you start thinking about, Well, I need to think about these things ahead of time. I need to really think about those before I implement them. And sometimes there are also weird side effects that you didn't think about that you suddenly realize exist. So one thing I'm a big fan of doing, we were doing some load testing when we, a few months ago, as we were building our dedicated tier, because dedicated, the idea is that you provision a certain number of graph compute units, and these compute units basically guarantee you a certain amount of traffic. So we had to make sure that, hey, you can actually do what you want to do, right? That we give you the right amounts. So I did a lot of load tests. And one thing that's important when you do load tests is to go way past the point of good put, way past what you actually should expect. And we found some odd behavior of tokio, for example, when you really go into, I'm talking about sending 50X the traffic we were reasonably expecting. But then you start to see weird behavior of Tokyo you're aware oh some of the internals will actually behave like a queue that you didn't expect right so there are still things there that you need to be careful of and that you need to test yourself and you need to still write tests right nothing is magical but trust really helped us think a lot about a lot of what could go wrong ahead of time
Matthias
00:30:11
i once heard i'm not sure if it's entirely correct but i heard that at oogle they usually plan for 10x the traffic whenever they build a new system and they want it to reach 100x, which is kind of the stretch goal whenever they build something new. And it feels like you're roughly in the same ballpark here where you mentioned 50x and, It feels like you're planning for growth already, which is great.
Nicolas
00:30:36
Yeah, it's, I mean, I used to be, I worked for five years at AWS as a solution architect there. And there is, if you don't know about it, it's a really great resource if you're building kind of distributed systems. It's called the Amazon Builders Library. Not very specific at all, but there is this article by David Yanacek, who was at the time on the AWS Lambda team, around testing what you expect in terms of good puts. And yeah, the general recommendation, as you can see from Google, from AWS, is go way past what you think is expected amount of traffic to make sure that your system can actually handle that and will not crash and burn.
Matthias
00:31:20
Did you run into any issues with cancellation of futures? Did you test any edge cases where you test negative paths for example you have a query that, triggers a lot of sub queries and you want to stop that is there something in the system or maybe even inside rust in in tokio maybe that prevents you from running into edge cases, yeah that's part of the thing where you know we get into this kind of third tier stuff where you You have to think about what's happening.
Nicolas
00:31:52
What happens when you cancel and what's the cancel safety of the futures that might depend on what you're doing and so on. So it's certainly, in our case, it was fine. In that specific case, it was fine. But this is certainly something that you need to think about and explore to make sure that there's no go-touch or test to make sure there's no go-touch because you update your library that might change a certain behavior internally. And this is the kind of thing where, yeah, There might be gotchas as we get into that space.
Matthias
00:32:27
We went really down this rabbit hole and went a little technical here. But it's sometimes unavoidable if you talk about these very challenging projects. And I want to take a step back and think about the team and what Rust brings to the team and also the Apollo team in general. roles. So can you give us a quick snapshot of what the team looks like right now? Maybe just roughly how big the engineering team is, how many people are ROS developers, and also if people freely move between languages or if they are more focused on one specific stack?
Nicolas
00:33:04
Yeah, so I don't have the details in my mind on how many developers we have. And because of what we do as a company, it's quite spread out. We have a sizable portion of our developers working on open-source technologies, so all the libraries we're making and so on, and these are going to be very language-specific. And then we have people working on the management plane, on GraphOS Cloud. I would say there's quite a strong affinity to languages in general. I mean, it doesn't mean that you never do any other languages, right? So in my personal case, for example, I mostly write Rust, but I had to fix bugs in a Terraform provider, which is in Go. So, you know, you go into Go, right? So, but in general, there's quite a strong affinity, to whatever language you're using, except, as I mentioned, there's a strong part of Kotlin where now some people are starting to do more and more Rust, which is exciting to see. Now, if I look a bit at the two teams that are really 100% involved with Rust, so we have the Apollo Router team, where it's 100% Rust, and then my team, the GraphOS Cloud team, where we're also doing that. The background is actually quite different, and that's also because the preoccupations we have are quite different, right? So the team that's actually writing the router itself, they have a lot of concern around performance to make sure that the router itself is going to be as fast as possible. While for us, because we care a lot more about provisioning and managing resources, performance is not necessarily the most important piece. So this kind of very high level, the very complex optimization with async rusts are not necessarily the first thing we go to. But then we have way more concern about safety. In our case, most of the people on my team actually were not using Rust before coming to Apollo.
Matthias
00:35:06
Interesting. So they learned Rust on the job. Did you use any training material or can you recommend any resources for people starting with Rust?
Nicolas
00:35:18
I think it kind of really depends. It's something I always say. I've helped quite a few people learning Rust And I'm generally happy to kind of give pointers to people when possible. But it really depends on how do you learn best. In my case, I'm an experiential learner. I learn best by actually just doing and writing code myself and doing experiments. And the first thing I'd like to say on that, by the way, when you're starting your Rust journey and you're starting to write code, it's okay. I mentioned the tier one stuff where you do a lot of clone, where the broad checker is still confusing, and sometimes you get a lot of compilation error that you don't necessarily understand. It's fine. It's totally fine to get there. By the way, even if you clone excessively resources, you might still be way better in terms of performance as what you would have if you're using JavaScript or Python. I mean, coming from Python, Python does a lot of cloning of values under the hood that you don't realize. Rust just makes things explicit, right? So some people also learn better via books, via tutorials and so on. And there's so much great resources all over the board. It's really hard for me to kind of point to one. In my case, I use the official resources that you find on a Rustlang website. So there is the Rust book. I think there's the Rust from examples, I think it's called. I don't really exactly remember now.
Matthias
00:36:44
Rust by example, yeah.
Nicolas
00:36:45
Yeah, yeah, yeah. These ones already helped me so much. And then I was just writing, starting with CLIs, writing some libraries and so on. I had a mentor, which when I started using Rust before at AWS and I had a mentor there was very helpful. We also have, because now we have a sizable number of Rust engineers, we have a Rust Slack channel where everybody can ask questions, but there's also the official Rust Discord. People are very friendly in the Rust community, so don't hesitate to reach out and ask questions to people. They will usually point you in the right direction. Then as you're getting more experience, I would highly recommend the book Rust for Rustacean. If you feel like you're getting more comfortable, you rarely get compiler nowadays or something like this, go with the Rust for Rustacean. It's a great book to really level up your Rust knowledge. So that's kind of my, yeah, like six to 12-month recommendation journey if you're looking at learning Rust or where you might feel yourself to be in terms of Rust experience. But to be honest, like a lot of people on our team were able to make PRs in Rust after a few weeks, right? Back when I was at AWS, as I was a solution architect, my job was to talk with a lot of companies and I had people reaching out to me saying, hey, that company wants to start using Rust, but they're a bit afraid. To be honest, they're a bit afraid of starting, of not finding the talents and so on. Well, that's fine. You can train people to learn Rust and they can start to be productive in a few weeks. It's not that scary of a language.
Matthias
00:38:23
book is a book by Jon The Rust for , I guess, and it's about the internals that you mentioned. I really liked your analogy with the three tiers. I think this is really on point. And it helps you from getting from tier one to tier two or from getting from tier two to tier three. What do you think?
Nicolas
00:38:47
Yeah, I think it's really helped you cement your knowledge in tier two because it's all about best practice for library design, you know, an API, but like rust API design, Getting to Tier 3 is getting more complex because it goes a way it's going to be way more situational. I do async Rust. I don't do that much embedded Rust. And the journey, I wouldn't consider myself to be extremely proficient in embedded Rust. I can code some stuff, and I've done some side projects like this, but this is not my area of expertise. And this is the part where what you're actually building starts to matter way more, and then looking into whatever resources are interesting. So the book from Mara Boss around atomics and locks, for example, was quite interesting for me to understand some of the inner behavior when we talk about some aspect of performance optimization for understanding how mutex works and the hoods, understanding how... It also helped me understanding how I can implement fairness better that are into the systems we have to achieve this kind of, you know, ensure that the system is going to continue to be able to deliver good puts when we send it 100x the amount of traffic which it should be able to handle, for example.
Matthias
00:40:11
Quite a few sophisticated resources you have there. It's pretty cool. I guess a lot of people won't even reach that point because they are happy with Rust and just what it provides out out of the box you probably have to work on a distributed system or something more specific to get into these areas right
Nicolas
00:40:33
yeah no it's it's you very likely won't need that and it's good it's actually a very good testament to the rust language itself like that you don't need these things it just it just already gives you so much out of the box in terms of performance just i mentioned like like T1, just doing the cloning stuff and so on, depending on which programming language you come from, you might already see performance boost just by doing that. And you might see safety boost because, oh, well, now you have to deal with options and results that you were not thinking about before. So you were not thinking about some error classes and now you have to think about. So just even there, you're already getting value from using Rust.
Matthias
00:41:17
Can you see differences between the different types of developers that start to adopt Rust? So do you see that this person comes from a JavaScript backend or from a Kotlin backend or from a Python backend, or is it very hard to discern?
Nicolas
00:41:31
Oh, there are some. I've seen a lot of people coming, for example, from Java to Rust, and you see the kind of thinking of object-oriented programming, some patterns that come from Java. Java is a language... With a lot of literature in terms of architecture patterns, software paradigms, and so on, that people are just used to adopting and sometimes want to bring to Rust in ways that don't always map one-to-one. The trade system is not hierarchy, right? It's not inheritance. So these are the kind of things that you see people trying to force certain software architecture patterns into Rust that don't necessarily fit one-to-one. And as I said, Java is probably the one that I see the most, but that's probably because I don't know Java myself. And there are so many Java developers out there. That's probably the one that surprises me the most.
Matthias
00:42:34
Is the trait system underused, in your opinion?
Nicolas
00:42:39
In a way, yes, but it's also a very dangerous slope if you don't understand necessarily what you're doing. It's like, in a lot of cases, I try to use traits where it makes sense, but it can also bring a lot of complexity, and especially when you're doing asynchronous, you might end up with very complicated trait signatures. I worked on the Rust runtime for AWS Lambda in the past, which has to be extremely flexible, because it's all about you bringing your own codes that does what you want, and the library just wraps that so it can communicate with the Lambda internal API. API. Yeah, the trait bounds were getting very wide. And then you have to use a lot of clever stuff to do conversion and so on. So there's lots you can do with the trait system. I think it's underutilized, but it can also bring a lot of complexity to your code base. There's also static versus dynamic dispatch. I won't go too much into the details, but that also has implications that you might need to understand. It's all about finding this balance or finding calibration there yeah yeah but there's one thing about books really traits and i think this is something that the Rust for helped is understand what are some of the common traits that are available in the system for example i seen quite a lot of people doing a from string implementation for force for for type while you the from str that's way more generic and way more flexible and so on so, there are things to or display for example right there are lots of traits that are built into the standard library that you might want to kind of explore and get to know better.
Matthias
00:44:22
Earlier, you mentioned domain-driven design and design patterns in general. Are there any pointers to people that are in tier two that want to learn more about how to build proper rust code, idiomatic rust code with proper, let's say, design patterns? Did you learn anything along your journey with regards to design patterns? And where did you learn that?
Nicolas
00:44:50
I mentioned it. I'm an experiential learner. I learn by doing and creating, and I find some cool ideas, and I'm thinking to myself, hey, how can I use that? It's kind of how I do things a lot. But there's also a lot of good literature for that beyond just the Rust community. One thing about patterns, patterns aren't a panacea. It's not like a silver bullet. Nothing is a silver bullet. You don't have to stick to exactly the right terms that this book guides you to do. but there are lots of really good resources in terms of software architecture patterns. I'm just trying to see if... I have one on my bookshelf. I don't remember the name, but it's a very good one. Yeah, so things like 97 things every software architect should know, for example, are good things. We all do a bit of software architecture as software engineers, whether we know it or not. So understanding these kind of things are quite good to have in terms of just looking a bit at everything that's possible and so on. Now that not everything will be applicable one-to-one to Rust, because a lot of these books are written by people that use Java or C Sharp. But still, there are lots of interesting things for that. So, for example, just the question about, I mentioned earlier, semantic versus safe types, right? It's a distinction that we had to make that you don't really find often on when we talk about DDD.
Matthias
00:46:17
And how functional should Rust code be?
Nicolas
00:46:21
Ooh.
Matthias
00:46:24
Dangerous question.
Nicolas
00:46:25
Dangerous question, because I like it a lot, but it's also like a foot gun sometimes, because there's so much power in the composability of Rust, all the map, map or, and so on, which are very, very nice. And I use them a lot because I really, really like them. But then the question is, how much readable is it? We get back to these kind of basic things around maintainability of code. The code file is going to optimize a lot of things in the hood for you, right? So the question is, if you think about your team, if you think about all the people around you, are people going to understand what this piece of code does? So the map methods are some that I use a lot, for example, for mapping a return type from a result, things like this, right? All the iterators are really awesome, and I really like them a lot. Once again, all the map stuff in iterators are really awesome and I really like them a lot. But you can also go too far to the point it's really hard to understand. If you start to have a chain of calls that's 30 lines long, ask yourself, hey, can anyone except me understand what's going on here? And will me in six months remember what this does
Matthias
00:47:47
if you have a chain of 30 calls in a functional way maybe that hints at a missing. Abstraction maybe a missing struct or some sort of trait or something that is in between that helps you break down the chain
Nicolas
00:48:04
yeah yeah yeah i mean Rust will let you do that but yeah it's, you might i mean yes you might just do like for example multiple maps that you could could just do it to one function or something like this. And remember, when you need to pass a closure, you can actually just pass a function. So you can also decouple that by writing a name function that you can then test in isolation. Because we haven't touched on test, but test has been very important in Rust.
Matthias
00:48:31
And with regards to structuring your code in general, would you agree that what you described is using functional patterns in, let's say, smaller scopes and structuring your code in a more object-oriented way in a bigger scope, or would you turn it around?
Nicolas
00:48:51
It kind of depends on what you do. I found that there are two very big ways. If you think about async Rust, I found there are two very big ways to do it, depending on where you end up. If you are building a network service, for example, something like the router, it's a GraphQL API that works like a pipeline that makes sub-requests to microservices and so on. Then we are way more leaning into the tower ecosystem and the service traits and basically composing service together into a pipeline. But now if I'm thinking about our control plane where we need to interact with 10 different services, all with their own SDKs and so on, then we might need to have some kind of struct that contains references to these different SDKs and these different clients so we can actually make the calls more efficient. And so these are two very opposed architecture patterns of AsyncRust, for example. And they have their pros and cons. It's a matter of knowing when to use watch. But the thing like composability of total service is really, really great as well. Just to avoid repeating the same thing over and over again. So it's a matter of finding kind of what's the right method based on what you're doing and what you can reasonably manage.
Matthias
00:50:22
That's one of the things I like the most about Rust that all of these crates, they develop outside of the standard library. And if you look at error handling, for example, we had multiple iterations of different error crates and each one improved on what came before. And I think this is really, really powerful. I see the same happening in Async Rust and in network services and web services and so on. and I think we should keep that up.
Nicolas
00:50:49
Oh, yes, yes. Going back to the question about traits, it's really where traits are really, really powerful. So, total service is great. You can build something on top of that. Or the question about the error trait or... Actually, I had a case like this, speaking of traits, in the way the hyperlibrary is composed, where it's under the hood, it's using trait, and I was able to do that to actually make a mock tester, for example, that actually makes calls, but just these calls don't go to the internet, just because I was able to plug into this trait system. Or if you do embedded programming in Rust, they also made awesome thing with traits, right? Where no matter what your actual specific hardware, where you have this abstraction layer, which is zero cost as well.
Matthias
00:51:39
Great. I guess we're getting towards the end. The remaining questions would be about the broader Rust community and also taking a bet on Rust from a business perspective. Regarding the community, would you say that you benefited from existing crates or did you have to mostly write everything from scratch for yourself?
Nicolas
00:52:00
So I have a mixed opinion on the crates. One is, yeah, there's so much stuff I could just use from the community. But small asterisk in terms of my personal philosophy in terms of dependency management is that every time you add a dependency, well, you own that from a vulnerability, from a security point of view, right? So still be careful. And that's actually where it gets into my downside. is if you're new to Rust, it's kind of hard to know which crates are actively maintained, which crates are the right ones to use. If you're looking at something a bit niche. Then that might get also a bit tricky to find, hey, which one's actually actively maintained? I was working on a small side project for myself with a very specific algorithm, and there were two implementations with two different crates. And I had to do a bit of research, you know like which one's actually actively maintained and which one hasn't been updated in five years right this is a part of the thing that's a bit hard to do if you just search for crates by itself trying to find which one's actually doing going to do what you need so that you don't have to reinvent the wheel yourself every time so it's a double edged sword it goes both ways yeah like that there are things where it will be fine you know tokio i mentioned tower are we We're using thiserror, for example, for error management. A lot of these crates are really well maintained and so on. But then when you get to the more deceptive stuff, or if you don't really pay attention, you might end up with something that's not maintained anymore.
Matthias
00:53:41
Do you think that Rust community is underfunded? Or what could be the reason for some maintenance issues
Nicolas
00:53:51
i mean i think it's it's there's lots of things that come from enthusiasts which is great you know it's still a fairly young language or things considered right so there are lots of craze that are made by enthusiasts and then you know enthusiasm doesn't necessarily last forever you can't be a single person maintaining the same project for 10 years right that's not really reasonable to expect that so things that have of actual funding things that are backed by companies are great where developers from a certain company can actually contribute to a certain project for example is really great and that's the story of open source in general right you need you need people that can maintain it and if these people can make a living out of it these people can be you know compensated for for their works and efforts. It's a really great thing. But there's lots of things in Rust that where people are just trying things out, learning, and so on, and then they just don't have the energy, the time, or something to maintain it anymore. And that's kind of the tricky part there. So there are some islands of very well-funded projects, but then there's lots of things that are not so well-funded.
Matthias
00:55:06
That means we have to find a transition from enthusiasm to long-term maintenance, and it's an ongoing process.
Nicolas
00:55:14
Yes, it is.
Matthias
00:55:15
And if you wanted to address a company that was on the verge of trying Rust for their own use case, for their own product, with your experience, what would you tell them?
Nicolas
00:55:29
So I'm going to start very businessy on this. but define a business case, define clear objectives and results you want to achieve is the first thing. It's not something that's just specific to Rust, but it's something I've seen a lot of companies kind of struggle with and say like, oh, we need to go with Rust, it's going to be more performant and so on. But then it's still going to be a project in itself. Try to find something small where you can have a higher value impact from what your expected returns are from there. So if you think like, oh, I'm going to give you two examples that might be relatable to some folks. So the first one is, let's say we have a performance-critical system, and we need to improve its performance. And we think, okay, if we rewrite it in Rust, we will get better performance. First step, is there some part of this that you can actually rewrite in Rust, you know, in isolation? Either a service itself or nowadays there's so many great interrupt capabilities between other languages so wasm is one you have bio3 for python you have all the things like this that allow you to write a library in rust or something that's for example very slow or if you're using lambda function on aws or any kind of things like this typically you can rewrite just one function So trying to find something small you can do to measure the impact, gain the experience as well of building that, because if it's for lots of people, the first project they're doing in Rust, they also need to learn. And so that would be a good way to kind of go forward. Another example I found is people using Rust for, once again, leveraging the ability to interrupt with different languages. Using like Wasm, for example, to create one library that you can use across web, iOS, Android, for example. So same here, you have a clear objective of what you want to achieve. And then from there you can go build what would be the minimum implementation we can actually see the return on how are we going to measure that you know well how do we define success so we can then how are we going to capture the learnings what was tough what was hard you know so you can actually have some solid base to say like hey do we really want to go with rust from then on right where can people learn more about apollo i would say it's very likely on our website you're going to learn learn a lot we have odyssey courses so these are courses to help you learn about graphical technologies in general but also what we offer at Apollo and how this can help you if you want to use graphical technologies awesome and finally it's become a little tradition around here. To let the guest share something with the broader rust community so if you have anything on your your mind that you want to convey something that maybe you want to always always wanted to share with the rust community something that we need to be aware of as a community that would be your chance yeah it's so one thing i really liked with the rust community is the people in general i found a lot of inclusiveness from from rust people especially towards people that are neurodivergent. Which is really awesome rust is is a programming language that help a lot when you have trouble dealing with high level of cognitive cognitive laws like me for example and that really helped me personally that is a really awesome trait of of the rust community is the fact that we are so. Accepting other others welcoming here to help each other's there is this i always remember when we We talk about these kind of things about Esteban Kuber, talking about ballads, the importance of ballads in the Rust programming language. So that's the one thing I would say to the Rust community as a whole. Keep that aspect. This is something that I haven't really experienced in many other communities, this level of inclusiveness and people just being there, being awesome and helping each other. So yeah, stay like this.
Matthias
00:59:45
You said it perfectly. I have nothing else to add. Nicolas, thanks so much for being on the show. I hope you enjoyed it as well. And yeah, thanks a lot.
Nicolas
00:59:55
Yeah, thank you for having me.