Rust in Production cover

Rust in Production

Matthias Endler

Since 12/2023 33 episodes

Matic with Eric Seppanen

Matthias Endler interviews Eric Seppanen about Rust's impact on privacy-focused home automation robots, emphasizing concurrency, security, and community collaboration.

2024-06-13 85 min

Description & Show Notes

The idea of smart robots automating away boring household chores sounds enticing, yet these devices rarely work as advertised: they get stuck, they break down, or are security nightmares. And so it's refreshing to see a company like Matic taking a different approach by attempting to build truly smart, reliable, and privacy-respecting robots. They use Rust for 95% of their codebase, and use camera vision to navigate, vacuum, and mop floors.

I sit down with Eric Seppanen, Software Engineer at Matic, to learn about vertical integration in robotics, on-device sensor processing, large Rust codebases, and why Rust is a great language for the problem space.

About Matic

Matic is on a mission to solve everyday problems with robotics. Design Milk wrote in an article about Matic: "Matic Robot Vacuum Collects Dust but Not Your Personal Data" and I really love that quote. It's a great summary of what Matic is about: privacy-respecting, truly smart robots. The San Francisco-based startup recently raised a $24M Series A round.

About Eric Seppanen

Eric is a systems engineer with a passion for reliable, well-designed software. He has a background in kernel development and high-performance computing with C++ and now works on robotics with Rust.

With his calm and insightful demeanor, Eric is the ideal person to talk about Rust's strengths for people with a C++ background.

Proudly Supported by CodeCrafters

CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch.

Start for free today and enjoy 40% off any paid plan by using this link.

Links From The Show

Official Links

About corrode

"Rust in Production" is a podcast by corrode, a company that helps teams adopt Rust. We offer training, consulting, and development services to help you succeed with Rust. If you want to learn more about how we can help you, please get in touch.

Transcript

This is Rust in Production, a podcast about companies using Rust to shape the future of infrastructure. My name is Matthias Endler from corrode, and today we are joined by Eric Seppanen from Matic to talk about using Rust for advanced privacy-first home automation robots. Eric, welcome. Can you talk a little bit about yourself?

Eric

00:00:23

Well, I started as an electrical engineer. I was designing circuit boards and FPGAs, but I'd been running Linux since its early days. So I started working with embedded Linux, and that led to doing some kernel work in C. And then I found a job using C++ for almost 10 years at a company called Pure Storage. I'm the kind of person who can't work on the same thing for more than a few years, because I always want to try something new and different. And I keep going higher and higher in the software stack, but I still like those physical things that you can touch. And I think projects that mix hardware and software, they really have a unique set of challenges. And I find that really interesting and fun.

Matthias

00:01:07

When I first reached out to you, it was because of a blog post that you wrote, which is titled Rust is the Safe Choice. And I really liked it. It really resonated with me. And then I dug deeper and looked at where you're working. And I found out that you work at Matic. And maybe in your own words, what would you say does Matic do?

Eric

00:01:31

Matic's goal is building intelligent robots that save people time. And that means that we're basically using computer vision to help robots navigate inside of your home. And that means a big software stack to help the robot see where it's going and understand the shape of the rooms it's in and be able to plan its movements in a way that makes sense. What we've seen so far is that a lot of people are not happy with the previous generation of home robots. They do dumb things and they crash and they get stuck a lot and they're not saving people time because they need to be watched and they need to be rescued constantly. And we'd like to do better than that. So we'd like to save you time and we'd like the technology to to be so good that it's magic, that you don't need to think about it.

Matthias

00:02:24

Now, with regards to your job there, how would you describe yourself at Matic?

Eric

00:02:31

I work on the platform team at Matic, and that means I work on the general parts of our Rust software stack. I work on dependency management and build systems and memory management, telemetry, all the things that you might find in any large software project.

Matthias

00:02:51

Did Matic hire you because of your ROS skills or because you worked on lower level kernel stuff or because you were generally just curious about what they did and you had a mix, a wide variety of skills? What was it like?

Eric

00:03:07

Like, well, I can't be sure what things got them to hire me. But the reason that I wanted to work there is because it's a company that feels a little bit like living in the future because easily 95% of what we do is in Rust. And I don't think there are many companies that can say that. It's a really good way of working.

Matthias

00:03:30

Most of the code is then written in Rust. That's really surprising to me because you have so many different components. You have so many different layers of abstraction, right? How much of it was kind of done by the vendor, by the tools and components that you use, and how much of it was done by the company itself?

Eric

00:03:51

I don't think we get hardware vendors delivering code in Rust. I think that's something that we'll see in the future, but it's not really the reality today. day. So we do have to spend some of our time building Rust FFI wrappers around vendor libraries and other system libraries.

Matthias

00:04:10

Okay, but when you say 95% is in Rust, what does that entail? What are the different layers, the different components of the system right now?

Eric

00:04:21

I'm not sure how to split our software stack up into layers, honestly, because there are so many elements of it that I personally don't have a great view into. I guess what I can say is that we're past the point where we even try to build things in other languages because we've had so much success with Rust that we will just reach for it anytime we need to do anything. We build native GUIs for debugging. We build web services. We build code that runs in the cloud. We build clients for other third-party services, and we keep extending our reach and doing new things in Rust, and it works out for us every single time. And that's... some of these environments are a lot less polished, you know? Running WebAssembly or maybe doing stuff in the cloud. Some of that stuff may not be as polished as other areas, but it's maybe only a year or two behind the rest of the Rust ecosystem. So mostly it's just amazing that with a single language, you can move around very fluidly between all of these very different platforms.

Matthias

00:05:32

It feels a bit like Matic is a company that values vertical integration. You have a lot of different layers, but you see it as one product. Is that correct?

Eric

00:05:43

Yeah, I think it's really interesting because the company does all of the hardware design in-house. So I work in the same building as mechanical engineers and electrical engineers. So we can get a great deal done in very little time because you can just go talk to the person who worked on the hardware and solve problems very quickly.

Matthias

00:06:05

With regards to this, when you build new hardware and you do that in a close-knit environment together with the team, is it so that you have Rust in mind from the very beginning? Or is that more of an afterthought? Because those people might not necessarily know Rust too well. Maybe they come from a C background or they don't really have any programming experience at all. And I wonder, do you go and have specific things that you want from the hardware that is easier to use from the Rust side, like a chip or some controller that you want to use? Or is it something that they build on by themselves and then later on you decide how to navigate it in Rust?

Eric

00:06:50

I think at this point we have enough confidence in our Rust abilities that we're willing to choose the best hardware and just trust that we will figure out a way to make it work. Anybody who's done these mixed hardware and software designs probably recognizes that this job is never easy. You know, the way it often works is a hardware vendor will deliver you a thousand pages of documentation and they will deliver a bunch of code and that code is probably kind of buggy and missing a bunch of the features that you need. And now it's your problem. You need to make it work. And you're basically on your own. and that's a lot of work no matter what language you're using. So in my opinion, you might as well choose the language that's going to make you the most productive. So writing those FFI layers or maybe taking some C code and rewriting it from scratch, that's definitely a lot of work, but it's a known amount of work. And you were probably doing quite a bit of that integration work anyway, just to be able to make that hardware and software work in your environment. And so I can kind of amortize that work over years and years of product development. And I think in the long term, it's a big win, because now you can really work effectively with that code and make changes and debug things. So if quality is important, then I think Rust starts to look really attractive.

Matthias

00:08:21

Very nice. Which hardware do you use specifically? Can you talk a little bit about that? Or is it a company secret?

Eric

00:08:29

The Matic robot is basically a small computer on wheels with a bunch of cameras, and then it has a vacuum and a mop roller sticking out the front so it can drive around cleaning your floors. Each one of those systems takes a lot of time and expertise to design and many generations of trial and error, and so we've built so many generations of the robot in-house that we have gotten pretty good at designing that set of hardware and software. In terms of the actual computer hardware, I probably shouldn't name the exact parts, but it's really not that exotic. There are a few platforms out there that give you a bunch of ARM cores and some GPU hardware and some camera interfaces, and that's basically all that we need. We have in the past switched from one hardware vendor to another, and so we're not really that dependent on the details there, as long as it gives us the same basic set of resources that we need.

Matthias

00:09:33

Okay, then you have those hardware components, you somehow integrate them into one bigger piece, you write the Rust drivers for it, I guess. Is that the next step that you need to take? Once you have the hardware, you need to write the drivers or are they provided already?

Eric

00:09:50

We don't have to do that much in terms of driver work because the camera interfaces are pretty straightforward. We consume the camera images, the raw camera images directly into our user space stack. And at that point, it's all of our own code. And in terms of controlling motors, we have a small microcontroller that we use to do the motors. others.

Matthias

00:10:19

Okay. You take the feed and then you do some commutation on it, but does all of that happen on the device or do you transfer that to the cloud or some other computer outside of the robot itself?

Eric

00:10:35

Yeah. So the goal is for the robot to be completely autonomous. It seems like a good decision for customer privacy. And it's really one of the things we wanted to do is to see if you could do this task entirely on the robot. If we can do this smart mapping and navigating with minimal resources, that would be a great accomplishment. We do use the local Wi-Fi network to be able to talk to a mobile phone app. So if you want to pull out your phone, you can tell the robot to do something and you can even play around a little little bit, you can move through a 3D map of your house and draw where you want it to clean. We have limited memory in CPU. I guess that makes it a little different from some other software projects. But really, the fundamental question is, can it be done at all? Some hardware platforms would just be too slow to be able to do the mapping and the navigation in real time. But once you know what's possible, it just becomes software engineering, making sure that the code can continue to get its job done with limited CPU and memory.

Matthias

00:11:46

Yes, and on top of it, I might imagine that it wasn't always certain that it would be possible to even run all of this on the machine itself. Is that correct? Or would you say from the early days, you already knew that you could use the hardware to its fullest extent and make it work?

Eric

00:12:06

I've only been at the company a few years, years, but I think the founders had enough of a background in computer vision that they could see that this sort of thing was starting to become possible. And I think they got the timing exactly right.

Matthias

00:12:22

Now, when I think about your blog post, initially, when I read the title, the only sort of connection I made with the term safety was safety in a sense of memory safety, like what Rust was good at. And then there was this other spin which was about safety being a safe choice so being the choice that is kind of obvious or the one that you should take given all of the constraints but now there's another third angle to the title which because it is about safety in terms of security or privacy and this is interesting too because i have one of the chinese brands and for me this always, leaves a bad taste because i don't know what they do with my data when they send it to cloud i could potentially use one of the home brew versions of it and then run my own distribution on it sort of but it is a lot of work and at the end of the day this also made me feel like i had to look for alternatives at some point so there's a third angle to the title of your blog post yeah kind of like that it.

Eric

00:13:40

Was unintentional that there were three meanings there the one that i mostly meant was the second one you know rust has memory safety everybody knows that by now, and to a certain degree it took me a while to figure out what I wanted to say because there's a lot of articles that say Rust is good and we like it but what I discovered is I can't imagine doing this job with anything less going to another systems programming language like C or C++ would feel like banging rocks together to make fire you know these are things that people used to do but we've learned how to be more productive than that. And there's this popular idea that startups get innovation tokens, right? You can only spend your tokens a certain number of times. And so startups should use boring technology wherever possible. And I think that's mostly right. You wouldn't want your brand new company to take a chance on a new thing that nobody understands because that could be the end of the company. But the engineers that I talk to, you know, they don't have to use Rust for very long before it just starts to be really obvious that this is a better way of working. It's just a very pragmatic set of tools. And these are not the only good tools out there. I don't think it's worthwhile to argue about Go versus Rust because they have very similar goals. You know, the languages are very different and they're maybe good at different things. But the ways that they improve the productivity is really similar. You know, it's all about eliminating the pain points and simplifying the tool chain and bringing all the developers sort of into a single ecosystem. I just think that if you reduce friction between the different projects in that language, it's one of those things that makes you really productive. I don't think I would want to go back to doing systems engineering and C or C++.

Matthias

00:15:41

Well, it sounds certainly true do you have any specific examples where you would say rust helped you or your team achieve something that seemed impossible with other languages?

Eric

00:15:53

Yeah that's an interesting question i don't think that there are small examples where things would be impossible in another language and rust makes them possible i think it's more that That individual things become so much easier that large-scale products, large-scale projects become possible that maybe would have been infeasible in another language. And it's because of that productivity angle again. The big ones for me, of course, memory safety is huge because there's a whole range of bugs that you just don't waste your time on. Use after free and buffer overruns, and you save time because you don't have to go debug those, and you're also saving time because you don't even think about them. This whole concept of coding fearlessly is so great because you don't have to hold back. you can try to be really ambitious. And when you step over the line, the compiler will stop you and say, please don't do that. Another aspect is thread safety. Writing concurrent and parallel programs, I think we as software engineers still really have a hard time at doing this correctly. And it's such an incredible accomplishment when you can have your software fan out across multiple CPUs and be well-behaved. I think 10 years ago that would have been nearly impossible to do without a tremendous effort on the programmer's part. I have seen this done successfully in C++, and I've seen how much effort and discipline it takes. And the moment you have a bad day, you break the entire system. It can be really stressful. So as a programmer, you sort of self-limit yourself, where if you're doing something that doesn't need super high performance, you try to restrict yourself to only the simplest tools. Because the good tools have sharp edges, and you get tired of getting cut by them all the time. And so having that sort of safe concurrency is just amazing.

Matthias

00:18:08

And sorry from your times as a kernel developer if you can still remember and maybe you still have PTSD from it but can you still remember times where you coded in fear where you were afraid that you would break things or you would miss an edge case was that a recurring topic for you or was it rather a safe space, a safe environment as well?

Eric

00:18:37

I think the last time I wrote much kernel code was probably 15 years ago. And I think at that point, I had never really seen a really, really high quality system where until you've seen such a good test environment that it will catch errors that occur only one in every 1 million executions, I don't think you really, as a programmer, you don't have a good view of how bad you really are. And maybe it's just me, but the longer I do this, the more I get the feeling that I'm my own worst enemy and I just need better tools to protect me from myself.

Matthias

00:19:20

It's interesting because I do share a similar sentiment in the past. I wrote software of which i thought it was pretty decent but now in hindsight it was broken in so many ways and there was no one that told me there was no mentor no guide and sometimes i wished for better tooling or a better linter didn't even cross my mind that there would be a better way to write software in and of itself in a different safer language yeah.

Eric

00:19:52

And i think the fact that there is a Rust community is an amazing advantage. The fact that there's one build system, one code format, or one documentation system, and it's shared by a much larger group than you'd ever find at any one company, that means that I can be working on something that I built and I can switch to something that a co-worker built and then switch to some random crate that I downloaded from crates.io, all of that code mostly looks familiar, uses the same idioms, uses a lot of the same libraries as I use. That's also really great for productivity. In some other language, you can have such a huge impedance mismatch between codebases that it can really take forever to get up to speed on a new codebase. Bringing in a new dependency can be a really major decision because you have to figure out how to integrate it with your build system and your documentation system and, Not having that pain point in Rust is really a wonderful thing.

Matthias

00:21:03

We had an episode with Volkert de Vries from Tweede Golf. And one thing that he mentioned which really resonated with me was that it is easier in Rust to jump into the standard library and just read what's going on and understand how the standard library works. That was really something that I really didn't reflect on, but it is certainly true for me. Rust the application code feels like rust the systems level code to some extent it feels like rust from the standard library and whenever i jumped into c or c plus plus i wanted to learn how it worked under the hood i was lost because i didn't speak this dialect of c or c plus plus that they use there yeah.

Eric

00:21:48

I i think that rust as a language favors expressiveness and And it's a community value as well. You know, the language doesn't stop you from doing bad things. It's a community value that types express things. You can express whether this is a valid URL or whether this number is a valid error code or things like that. And it's a community value that expressing invariance in code instead of in comments. And I think that's wonderful.

Matthias

00:22:23

For that means you mostly talk about types and how they relate to one another you talk about composability rather than hearts maybe systems level things and interfaces and and very specific implementation details is that correct you talk about it on a higher level.

Eric

00:22:47

Yeah, I think that's all true. I think also, I just think about 20 years of programming or 25 years of programming. And I think about what it's like to be sort of the first generation of programmers on a project where you are taught how to maintain invariance by your peers and you You learn why some unusual design decisions were made. But if you get to watch a successful project that goes on for 10 years, you're eventually on these third and fourth generation engineers, and all they have is the source code and some myths and rumors. They don't know where the important invariants are, and they don't know why these design decisions were made in the past. And I think that's the point at which projects can start to go out of control. And I put myself in their shoes. If I'm one of those engineers, I would really hope the code is in a language like Rust. Because now the code can carry this message down through the generations. If the language is expressive enough, you can communicate those things. And it's also because I feel really bad about some of the code I wrote 10 years ago. go. I bet it's still in place and nobody knows how to get rid of it because I was not able to do a good enough job of expressing how it fits into the larger system.

Matthias

00:24:13

How much of it is because of the lack of rust in the past and how much of it is because of you growing as an engineer?

Eric

00:24:23

That's a good question. I'm sure it's both. In a sense, I do try to make up for all of my mistakes of the past. But it's also just that I find the job more satisfying when I feel like I'm leaving behind code that can be better understood and maintained by other people.

Matthias

00:24:43

When your experience grows, do you think about your legacy a lot? How you leave the code base for other people?

Eric

00:24:53

I don't think I'm old enough to think about legacy yet. yet, but I think that that's one of the higher-level challenges of software design. Writing code that works, in a sense, is the easy part. Writing code that can also be understood by other human beings, to the degree where they could take it apart and put it back together again and make changes, that's to me a much more interesting challenge. And as software projects grow. I think that's a challenge that a lot of companies face. And in a sense, it's a limiting factor to how they can continue to release new product year after year.

Matthias

00:25:34

When you look at the current code base for the vacuum cleaner, what are the main abstractions that you work with and some of the types that might be specific to this robot? For example, when you take the camera feed do you convert it to something that is quote-unquote safe do you convert it to your internal type so that you can handle it better or do you deal with a lot of, raw data throughout the system for performance reasons or otherwise i.

Eric

00:26:06

Think mostly we're able to deal with safe types we have relatively little unsafe code in our tree and mostly it's there to to do these sorts of FFI layers for interacting directly with the hardware. Mostly the abstraction that I think of in terms of how does the robot manage data is we have what are called map layers. And the map is what you might think it is. It's sort of a representation of your house. And those camera images get digested into this 3D map. Think of it as a 3D point cloud that we can look at all of those points and determine where are there obstacles, where are the room boundaries, what type of floor are we traveling across, where are there places maybe we're not allowed to go because there might be, if you leave your charging cable on the floor, we don't want to drive over that. And so there are many map layers to help us with the navigation and making good decisions.

Matthias

00:27:11

And how does it work in practice? For example, let's say you have a charging cable that's lying across the floor. Do you recognize that before you even cross the cable or do you have to go over the bump once and then you remember? How does it work visually with other sensors?

Eric

00:27:32

Yeah, it's visually from the camera images itself. We do have some neural networks that have been trained to recognize cables, and they will sort of highlight that in the image, which then gets translated into the 3D space so that the robot knows where it's allowed to go and where it's not.

Matthias

00:27:53

So it's a bit like Tesla with their 3D model based on cameras, or do you also have depth sensors?

Eric

00:28:00

It's entirely cameras. Fortunately, our robot is fairly lightweight and doesn't move very quickly. So we don't have the sort of extreme safety risks that an autonomous car would have.

Matthias

00:28:13

And when you built this map view, how much of it did you have to build yourself? And how much of it could you use from the Rust ecosystem? Were there any existing crates that you could build on? Or did you really have to build everything from scratch?

Eric

00:28:30

I think we used existing crates wherever we could, but most of it was built from scratch. We have an internal visualization tool that we built using the egui crates and its ecosystem. And I haven't personally worked on that code base, but the people who I talk to who have seem to like it a lot.

Matthias

00:28:51

Okay, then do you also update the map from time to time or is it a static version of it? So do you scan the room once and then store that because essentially your room doesn't changed that much? Or would you say there are update steps as well with every single iteration? How does that work?

Eric

00:29:10

The map is constantly updated. So every image that the robot sees through the camera causes a change in the map. And so it's constantly re-evaluating what's in its way. And so one of the interesting experiences is to walk in front of the robot while it's driving right at you, and to see it slow and navigate carefully around you, which is not something that people are used to seeing with, other robot vacuum cleaners.

Matthias

00:29:43

Absolutely okay then when you update map how does that part work you take a image from a camera feed and then you need to know the location where you are you need to keep track of that and then do you update it in some sort of tree-like structure where you have, different areas of your apartment and then you only need to update this specific part and need to know where to map it to or is it a flat structure how does that part work um.

Eric

00:30:16

So there's all of the robots out there that are able to navigate using cameras the general set of algorithms they call them SLAM simultaneous location and mapping and it's taking those camera images and comparing data in those images to data it's stored in the past. So the robot first has to decide, am I in a place that I've ever seen before? And that could be yes or no. And so if you were to drop the robot in a new room that it had never seen before, it essentially has to start a new map. And every new surface that it sees, it will sort of gradually, image by image, start to stitch together a new 3D environment based on the way that the camera images overlap. And because these are stereo cameras, it's getting not just images, but also it can figure out the depth information, how far away everything is. And that gets reinforced as it sort of moves around the space. And then the robot may drive through a doorway and suddenly it discovers, oh, you know, I've seen this part of the room before and now it actually has to stitch two entire maps together at the doorway where they meet. And so it's a pretty neat process to watch it happening. It's rather amazing that it can be done on a robot this small.

Matthias

00:31:45

Yes, and it can be done in real time, which is kind of crazy. Where would you say is the hardware bottleneck? Is it CPU part of computation or is it the IO part in terms of storing and retrieving information from the storage?

Eric

00:32:01

I think to a sense, it's the cost part in that if you want to spend more money on CPU cores and GPU cores, you can make the robot run faster and consume more detailed images and have more detailed maps. And so it's really up to us to choose a cost point that makes sense for this particular product. But if you're willing to spend more money, you can do things even faster. And so I imagine you could design a whole family of different robots with different capabilities, depending on how much you want to pay for them.

Matthias

00:32:41

Did Rust help you save hardware costs?

Eric

00:32:45

I would say in a broad sense, yes, because Rust as a language is one of these environments where you're very in control of your own memory use. So a garbage collected language would probably put some significant memory pressure on us. I think that would not work out very well. As it is, we are constantly under memory pressure anyway, just because the amount of data we're trying to ingest is fairly large. So Rust gives us the tools to do the most with the CPU and the memory we have available to us.

Matthias

00:33:25

Is some of that code executed in parallel or concurrently? Do you do, and I guess you have to at some point, do multiple things concurrently because you get a camera feed, you need to update the map, all of that has to happen more or less at the same time.

Eric

00:33:45

Yeah, we have a lot of things going on in parallel. We have four ARM cores, and then some GPU hardware that's also operating concurrently. And so some of it we use tokio Async Executor to schedule tasks. And then some work is really just CPU intensive. So we have, I guess what you'd call blocking threads to execute that work on.

Matthias

00:34:10

Pretty impressive. Does it mean we will have a vacuum robot that runs on futures or the async stack in the future?

Eric

00:34:20

Yeah.

Matthias

00:34:21

It already does, sort of, right?

Eric

00:34:22

It already does. That doesn't seem terribly surprising to me. Async code seems pretty mature in Rust, and we don't have any fear about using it in production.

Matthias

00:34:31

That is extremely cool, because I guess it should serve as a sort of testament to what Rust can do in different environments. And maybe there are people that know tokio from a purely web environment, for example, and hearing that you could use some of the same technologies in such a constrained environment and something that is close to the hardware, at least, environment is kind of encouraging, right?

Eric

00:34:58

It is. And it's really helpful that as we move between different parts of the code base that things don't have to change all that radically. So if the robot needs to download a software update from the cloud, we're using the same async HTTP client that the rest of the world does. And we use all of that code for our own debugging tools as well. And the fact that that can live in our code base side by side with all of this magic stuff that's very specific to our robot and our problem is great for productivity because you can shift seamlessly between the code and use all of the same strategies and coding techniques.

Matthias

00:35:42

Is it true that the updater component is also written in Rust?

Eric

00:35:46

Yeah. Yeah. In fact, I hadn't really done much cloud development before working at Matic. And I happened to get picked to do that particular part of our system. And so I spun up my first ever cloud service using Axum and wrote clients for it using, you know, tokio and reqwest and all of the standard Rust crates. And it's worked out very well. I find it a really pleasant environment to work in.

Matthias

00:36:17

From your background, given that you worked closer to the system before, what was your first impression about the async web environment? And also maybe on a more broader scale, the web ecosystem in Rust in general.

Eric

00:36:37

I haven't done that much in the web ecosystem. I sort of feel like I'm pretending a little bit because I don't necessarily know what all of the expected ways of working are, but it seems very mature to me. I've done interactions with AWS services and Cloudflare services and built web servers and web clients. And for the most part, everything seems very polished to me and very easy to use. But the only thing where I felt like I was reaching for some of the more extreme use cases is if you want to use TLS client certificates. I think that's a little bit of an exotic use case. But if you want to do basic sort of token-based authentication, I think it's a bit easier.

Matthias

00:37:25

What does the development cycle look like? Like, do you push updates every day and do they get updated more or less? Do they get pushed to the clients, to the robots automatically? Or do you have weekly update cycles? What is it like? Continuous deployment?

Eric

00:37:47

For the robots that are in our office, it's fairly continuous. It sort of depends if a particular engineer has a robot of their own that they're debugging on. They control the deployment procedure. For beta customers right now, we might release every week or two, and I would imagine that's going to slow down a lot once the software stabilizes and we kill the last bugs.

Matthias

00:38:14

Did you ever run into issues with the update process? For example, the update got stuck or you couldn't download the entire package. You probably have some way to verify that the update image is correct based on some checksum, I would assume. But for example, things during the flashing process might not work as expected. Do you keep that in sort of a sandbox box and you can always recover to a clean state or how do you handle that part?

Eric

00:38:47

Yeah, we always want to be able to fall back to the previous version if something goes wrong. I think the set of problems that we have is going to change as we start to ship larger and larger numbers of robots. Right now, I think most of the problems that I've seen around updates is that home Wi-Fi networks can have a lot of strange failure modes. And teaching our robot how to deal with those failure modes is mostly what we see so far. And the other thing that we notice is that everybody has that one corner in the house where the Wi-Fi reception is really bad. And it's fairly common that that's where the robot parked on its charger. So figuring out how to deal with that problem is kind of tricky. In a sense, it would be interesting if we actually taught the robot how to while it's mapping the house we should teach it how to map the wi-fi signal strength so it knows where in the house it can go if it needs good network connectivity oh.

Matthias

00:39:50

Can it update while it's doing its job or can it only update when it's in the home base.

Eric

00:39:59

There's no reason why it couldn't do it at the same time as it's cleaning. It's not a big CPU load to be downloading stuff over the network. I think mostly we want to do the update while it's idle, just because as we start committing to writing down the new software version, we don't want the robot to lose power if it can't find its dock. So the dock is always the safest place to do things that you can't take back.

Matthias

00:40:28

Right. Do you even think about all of these things when you develop new software and you ship it? Or do you just push to a repository and it will handle itself somehow?

Eric

00:40:40

Mostly, I think we don't worry about it. I think when you ship a hardware product, testing for quality and preventing regressions is the hardest part of the engineering. Engineering, because if all you ship is, say, a web service, you can build that anywhere and you can test it anywhere. And it's really easy to say that things work the same way as the previous generation. When you have a custom piece of hardware, if you really want to do effective testing, you need that custom piece of hardware in the test loop. But if you have a robot that moves and sees different things every time, now testing has become extremely difficult.

Matthias

00:41:23

Did rust help you with that somehow for example did you have to touch the updater a lot and change its behavior or patch it and maybe add ways to handle edge conditions or was it mostly a don't job and you don't touch it that much anymore.

Eric

00:41:45

I think that no software is perfect the first time, and I think Rust makes us more productive in all aspects. So figuring out what the bug was, I think I can do that faster than I could in other languages. And applying the fix and testing it and being confident that it's correct, those things go a lot faster as well. And that doesn't mean that I'm perfect and never introduce bugs so sure there have been plenty of updates since the first version of the updater came out.

Matthias

00:42:19

Okay then that means you you still work on this component and you probably have a way to handle errors when things like the update fails probably there might be a way to also log errors but do you also handle panics how how's the error handling story right now on the device specifically it must be quite a challenge.

Eric

00:42:45

Yeah, I think I always like to start with the simplest possible error handling. And in the case of a software update, that means if we try to download something and the download doesn't complete, we just give up for now and we'll try again later. And hopefully that will succeed. And we do log everything. So the hope is that if we discover that there's a robot that's having trouble updating that we may be able to get access to those logs to figure out what the problem is. The nice thing about running a lot of robots ourselves is really significant bugs will eventually happen in our office or in one of our homes, and we can go and investigate what it is that went wrong. But we also use fairly standard Linux tools, so the updater is a systemd service, and if it panics and goes down, systemd will restart it, and it will try again later.

Matthias

00:43:41

Did you ever have to catch panics in Rust?

Eric

00:43:45

We do catch panics, but not for very good reasons. I think we just wanted the ability to print out a few extra values before the stack trace gets emitted. So for the most part, a panic is a crash. And we try to build our software in a way that if a process crashes, it just gets restarted. And somebody who's looking at the robot would really never notice that anything had gone wrong at all.

Matthias

00:44:12

Impressive. We covered the graphical user interface to some extent. We covered the map functionality. functionality there's this web client which takes on the updates and the requests from the robots we briefly touched on cloud components because this is deployed in the cloud i assume there's this user-facing app and the third-party libraries but one thing that we haven't covered yet is the communication between hardware and software and i did wonder about this what are the communication protocols? What system is used internally to communicate between the different parts of the robot? Is it a message bus? Is it a custom protocol? Do you use something like MQTT? How does it look like?

Eric

00:44:59

At varying times, we've used some of everything. We do have some MQTT protocols and we've had some custom built protocols. And we've rewritten our messaging between the robot and the mobile phone app many times trying to get to the right sweet spot where it tolerates bad network conditions and also tries to be kind to other users of the network but we'd also like it to be fairly portable because it has to run on ios as well as on the robot. It's one code base. So we do run Rust code on iOS. And we also need, there's a large number of shared data structures. So we need those to be serialized in a very portable way. And so this is one of those areas where I think we maybe haven't found the exact right combination yet. So we keep trying and we keep learning and getting better at it. Internally to the robot, we tend to put most of the functionality into a small number of large binaries and so most of the internal communication can be direct function calls or channels.

Matthias

00:46:15

With regards to the libraries that you tried for external communication, can you still remember what was missing there? Anything that, comes to mind, why you were not extremely happy about all of them?

Eric

00:46:31

Most of the ones that we've been unhappy with were things we built internally. And it wasn't that we were unhappy with the performance exactly. I think it was just a case where the code got so complicated that we were having a tough time maintaining it. And so we gradually had to stop using it and replace it with a simpler protocol. call.

Matthias

00:46:51

Is it correct that there's no message queue on the system or do you have one that you use to communicate between different separate components?

Eric

00:47:01

I'm sure there are many message queues. I can tell you this. We have a really large dependency tree. We write a lot of code very quickly and we check it all into a single workspace. So it can actually be somewhat intimidating in that there are dozens of crates that I have no idea what they do. At any one time, you know, we probably have at least 500 crates of our own. And our dependency tree, the last time I looked was more than a thousand third-party crates. And not all of that is code for the robot. A great deal of it is just the fact that we drop in all of our developer tools into the same workspace because, you know, there's always some connection where they want to have a shared data structure or something. So we have, you know, you sometimes people talk about the mono repo strategy, we kind of have the mono workspace strategy. And it feels like we're really straining the limits of what can be done within a single Rust workspace. Sometimes it's really convenient, you know, if you see a bug in a library and you want to update it, you can update it one place in the workspace and know that everybody's handled. But at the same time, I worry a little bit that we may be stressing the system beyond what it was really designed to do. And at some point, we may need to start breaking that apart.

Matthias

00:48:25

I can't imagine because it must be... Pretty common for you to run into limitations of cargo workspaces. Are there any missing features that come to mind? Or is it mostly about the size of workspace that becomes a problem at some point?

Eric

00:48:46

It's kind of a maintenance problem once you have a very large project, because once you have that many third-party dependencies, you have to worry about security concerns, maintenance concerns. We run cargo deny a few times a week to catch new security advisories and alerts about crates that might be unmaintained. One thing that I think is probably missing is I would love to be able to do sort of a confidence estimate about a new dependency. And there might be many, many different things that would need to be checked. Is this crate maintained? Do they have good test coverage? Is there a broader community of people using this crate? We don't want to necessarily be the only ones using it. Does it use a lot of unsafe code? There's so many small data points that you could imagine a tool that might aggregate them all together and say, you know, this looks like a pretty solid dependency, looks good, we should allow it into our tree. Or maybe every once in a while, somebody will just grab the first thing that popped up in a search on crates.io, and you'd like there to be a safety check that says, maybe we should look harder at this one.

Matthias

00:50:06

Very nice idea.

Eric

00:50:07

Maybe it's not something we want to depend on. And it doesn't solve the problem though, because even well-maintained dependencies, several years later, sometimes the maintainer moves on and the crate becomes unmaintained. And as things change, sometimes unsoundness is discovered years after the fact, because I think the Rust compiler's idea of what's allowed and what's undefined behavior has subtle shifts over time and sometimes miri gets better at discovering things it wasn't aware of previously so we need to be constantly aware of the impact of having that many dependencies and it's an ongoing challenge you.

Matthias

00:50:51

Know i think about that a lot and i fully agree with this i just wonder how such a service might look like Like, would that be a web service where you can have all of the dependencies as a table? Would it be something like cargo deny that you run from your terminal? What is the most convenient way to integrate it into your development workflow? Would it be a Visual Studio Code plugin or all of the above? Would it be a CI CD check? So many questions.

Eric

00:51:23

I think the conventional answer would be this is something that you put in CI. You know, not on the pull request path because you wouldn't want a pull request that was okay five minutes ago to suddenly spontaneously break. It's maybe something that you kind of run against the master branch once a week or something. Now that I've said that, maybe I should change my answer a little bit in that in the pull request, if you're adding a new dependency, maybe that should trigger a deeper inspection. But I'd have to imagine that having that data set of information about all the crates that have ever been published, you would have to have some kind of centralized system for aggregating that data. And this also sounds like a very opinionated tool, right? So my idea of a good dependency might be very different from your idea. And so maybe there's even sort of the idea of how you aggregate all these different data points into a single metric might need to be customized for different users. I think it's a really interesting problem, and I hope someday it exists.

Matthias

00:52:33

Tests it reminds me a bit of lighthouse scores for websites where they have a scale from 0 to 100 and then it is still somewhat subjective granted but also they try to make it objective because it is more of a range of values you have something that is reasonably good or something that is reasonably bad and you kind of can gauge from just the single number in what shape a crate or a dependency or a website in this case is would that help would that be some useful metric for you yeah.

Eric

00:53:13

I think that that's a place that you get to after you've done this for many years and you've had a lot of discussions with your user base about what makes a good metric and after they've used it a bunch and learn to trust you, then yeah, maybe your default metric ends up being the one that everybody trusts because they've seen how you manage all of these competing values. But at first, I think it would need to be done in a way where you could be respectful of differing opinions and work to build that trust over time. It's funny because in the Rust world, we get very used to the compiler being very opinionated, right? And even the fact that we've all learned to trust Rust format, used to be people would argue a lot about coding styles. Profiles and i'm really really happy that at least in the rust and go worlds and mostly python as well people have moved on from that kind of argument you know it's so much more valuable to have us share a single format but we accept that it's opinionated and we accept that everybody disagrees a little bit but we can all come together on a single i don't know shared opinion i guess and say that it's good enough. But when it comes to dependency management and things at the cargo level, I think the tools have not been as opinionated in that you can publish things that might really violate the expectations of the community. You could publish crates that are unsound. You could publish crates that are hostile, that do strange things in build.rs. And, you know, unless you do something so egregious that you get kicked off of crates.io, there's really not a layer that gets opinionated about, is this a thing that people will want to use as a dependency? And I think that would be an interesting space to explore.

Matthias

00:55:13

This is very true. We did have quite a few interviews before with companies that use Rust in production. You're the first one to raise that request. And if I had to put my product head on for a moment, I would ask, if that was such a highly demanded feature, why has no one built it so far? Is it because this is mostly a necessity by people that use Rust in the enterprise, Rust for production for something more mature, and then they start to see the lack of tooling for enterprise? Or is it because it might actually be more of a community feature and there's no product behind that?

Eric

00:55:57

I think it's probably something you're more likely to see in companies and larger enterprises. In the open source world, everything tends to be modular, fairly small crates with a limited set of dependencies, and you really have a good familiarity with all of your dependencies. And once you get to these mega projects with hundreds or thousands of dependencies, that's where it starts to feel like some additional tooling is needed. And that's something that you're probably more likely to see in closed source codebases at companies.

Matthias

00:56:31

Wouldn't it be easy for a company to build such a product? Not sure if a bigger company already built something like this for themselves, or maybe it's available internally, but I'm really curious why that doesn't exist yet.

Eric

00:56:47

Well, there are shared code review systems out there already, and companies like Google and AWS do contribute to those. I think the system that I might be interested in is something that even if nobody else in the world has ever reviewed this crate, I could still get some kind of a metric based on a little more of an automated analysis. But at the same time, I think those two different kinds of systems can interact with one another. The fact that somebody at Google has done a positive code review on a crate is a valuable input to an overall metric. And if the automated system flags maybe what seems like a suspiciously high amount of unsafe code, maybe that would also be useful data for somebody who decides to go and and do a code review. Certainly, if I were looking at our set of dependencies, and I had a program that would give me a list and say, these are sort of the top 1% of my dependencies that maybe look like they have perhaps quality issues, then at least now that helps me know where to focus my attention.

Matthias

00:58:03

Would that be something that a company like Matic would pay money for if that was part of their critical infrastructure? You bet big on Rust, so maybe it might pay off for you to have such a system.

Eric

00:58:17

Yeah, probably. Or maybe I'll decide to build it myself.

Matthias

00:58:22

You could also go one step further and think not only about single dependencies, but more about stacks of dependencies. Dependencies for example you could say which crates are commonly used together and for example i don't know much about embedded maybe i can go and try and understand how other embedded rust, applications are built maybe i get the choice of three or four different stacks where i know these abstractions or these crates these libraries work well together would that also be helpful yeah.

Eric

00:58:59

I mean certainly the fact that one crate is used by another crate that is high quality that does seem to be a good vote of confidence.

Matthias

00:59:10

At the end you have a sort of graph-like structure with relationships you have child and parent relationships you have a crate which uses many dependencies that might be the children but you also have an application that uses this crate that might be the parent so you have this this graph structure actually what I describe is more like a tree but you can look at it also as a graph because there's relations between the crates on the same level, Actually, it might be something like a tree of graphs where each layer is sort of the siblings for one crate, crates that get used together in the same cargo tumble.

Eric

01:00:00

Yeah. And another interesting problem that happens when you have these large projects is because of cargo's feature unification. You know, it says in bold text in the cargo manual that features must be additive. And I think the more time I spend looking at many, many dependencies, the more you realize that there are a lot of non-additive features out there in the world. I see them all over the place. And they're in some very common crates. But once you start including all of your code in a single workspace, you realize that a feature that you turned on in one project is now turned on everywhere. And so non-additive features start to have a bigger and bigger impact the larger your workspace gets. And I have actually had to go in and vendor and patch some very common crates to make sure that a feature never got turned on. can.

Matthias

01:01:00

You give me an example i.

Eric

01:01:02

Mean i don't want to shame any particular crates but there's a crate that implements spin locks and there are crates that will given a particular feature flag change over all of their internal logic from mutexes to spin locks so once you have a very large dependency tree any one of your crates or any one of your dependencies may decide to throw that switch and suddenly everything in your code that uses that crate gets spinlocks instead of mutexes. Now that's a pretty significant change. And so that's the sort of thing that I keep an eye out for. And I was actually aware that the problem existed because I happened to look through the code base previously and I thought, well, this crate's been around for a long time. Surely Surely nobody would ever publish a crate that unilaterally throws that switch. And about two months later, it happened. And I only noticed because when it switched from mutexes to spinlocks, it made a subtle change to a type became no longer sync or send, I think. And so it actually caused a compile error in our code. And that was how I discovered that suddenly spinlocks were enabled everywhere in our system.

Matthias

01:02:20

Wow, that's really a drastic change. But isn't it possible to disable certain features for certain members of the workspace only?

Eric

01:02:30

I'm not aware of a way to disable a feature if one of your dependencies enables that feature.

Matthias

01:02:39

Ah, yeah, that's a little harder. I guess it's still potentially doable by introducing your own feature, which describes how you can disable this dependencies feature. I'm not 100% certain about it. But that would even mean that you would have to be aware of all the features that could potentially be enabled in the dependencies. So it doesn't really help you there. here.

Eric

01:03:00

Yeah. I think mostly it's just an awareness that as you get a very large workspace, that dependency management is a very real task and ignoring that obligation can get you into trouble in any number of ways. I think the top level one is security. And once you start paying attention to security, it's very easy to start paying attention to all of these other issues as well.

Matthias

01:03:25

Are there any other enterprise features that you might be missing right now? I would assume that you use a private crate registry for your dependencies for your internal ones or no.

Eric

01:03:36

We haven't done that yet we use public crates.io dependencies and cache the downloads.

Matthias

01:03:42

But about your own crates that you need to maintain you probably don't have the problem because everything is in one single workspace right that's one of the advantages exactly it's And that's.

Eric

01:03:56

The easy path is just to keep piling everything into a single workspace. Then we don't have to work out when is the right time to publish and how do we manage that internal registry. The master branch is where the code lives and everything is consistent based on that.

Matthias

01:04:14

But are there any other enterprise features that are lacking right now? I could think of monitoring, metrics, the development environment, debuggers, things that mostly enterprise users need for bigger projects. Of course, everyone should use a debugger, but there might be certain things that you only use in an enterprise context. Authentication, how tokens get handled, I don't know, anything that comes to mind.

Eric

01:04:44

I think that there are enough enterprises using and contributing to Rust that most of the basics are already handled. And anything that happens to be missing that we need, we try to build our own or contribute it back to the right crate upstream. You know, one of the things, like I mentioned earlier, TLS client certificates are more heavily used in enterprise environments than in the open source world. And so, you know, occasionally that might be a place where in the HTTP libraries, you're more likely to find a bug relating to TLS client certificates than you are in kind of the mainstream code path that everybody else uses. And so what's the fix to that? You know, I try to contribute a fix here and there. And I published a couple of crates that supply canned TLS test certificates for use in unit tests. And I could go a lot further, I should go a lot further. But that's one of those things that that I would like to see tested more comprehensively. Because if you have something that does implements an http client you probably write a unit test against local host port 80 right but you probably don't want to do all the excessive amount of work to figure out how to spin up a proper https service running on local host just for the unit test you need private keys and public keys and certificates you need to have a root certificate that somehow gets plumbed into your system or at least into your HTTP client. And all of that is a lot of work for somebody who just doesn't really use that code as part of their development process. So I think that's one of those areas where when I see a gap, I like to pitch in a little bit and contribute if I can.

Matthias

01:06:42

And speaking of edge cases, early on you mentioned that 95%, if I remember correctly, of the code base are in Rust.

Eric

01:06:49

It's just a guess.

Matthias

01:06:51

What about the other 5%?

Eric

01:06:53

So we have a Linux kernel and it's somewhat customized for this embedded CPU ecosystem that we get. And we have a lot of system libraries that are run in C and C plus plus. And so we include those in our tree and build them as we're building all of the rust code for the robot. bot. So it's not that different from what you would find on, A Linux server appliance, perhaps?

Matthias

01:07:25

Do you even have any of the other, say, more traditional languages like Java or like web things, for example, TypeScript or JavaScript? Or because you even mentioned that the iOS application is at least in parts written in Rust, do you even have such cases where you need to cross the language boundary?

Eric

01:07:50

Yeah, lots. I mean, for iOS, most of the application gets written in Swift or whatever the language of choice is for building iOS applications. And really, we've just made a Rust library that understands how to speak our protocol to talk to the robot, and it understands the data structures that are specific to our robot. We don't use Java. I would say we probably have small tools here and there written in JavaScript. Speaking for myself personally, I am apparently just a really terrible JavaScript programmer. I am so bad at JavaScript that when I need to build code that runs in a web page, I will just write it in Rust and compile it to WebAssembly. It's not a perfect environment. You certainly have to experience a little bit of pain and jump through some hoops to get it to work. But I find that a lot easier than writing code in JavaScript, to be honest. And I think that's That's mostly just me, because I find it so disconcerting to be in a programming language with a weak type system where mysterious values can propagate through the code. And by the time they explode, I have no idea what went wrong.

Matthias

01:09:05

I can share the sentiment. Probably I don't take it that far. Most of the time, I still have some JavaScript projects, and I still use JavaScript on the the front end but there are things like laptops for example which allow you to build parts of your front end in rust as well with web assembly and you have this deep tool integration and it feels just right once you hit that sweet spot where you can talk about the same instructions, both on the back end and the front end side it's really cool yeah and.

Eric

01:09:36

In my personal time i've also experimented a little bit with bevy. And it's really amazing in bevy that you can build a native application. And then you can also recompile for WebAssembly and see your very same game run in a web browser. And the first time I tried it, it worked.

Matthias

01:09:54

In your blog post, you mentioned that Rust was an easy-to-pick-up language for an intern. And now that you describe your background and all the things that you do, I wonder if it was more about you with all of your knowledge and maybe your expertise and the way you can learn things pretty quickly, or if it was also about the language. It kind of somewhat contradicts the common belief that rust is a difficult language to learn, or it might also be that you just got lucky as an intern and maybe your perception is that rust is easier to learn whereas in reality it's easy to learn for you but not for others can you comment on that?

Eric

01:10:42

Sure. So to repeat the story at Matic, we have this visualizer application, and it's used by engineers to sort of monitor and interact with the robot. And we had originally built this program in Python, but last summer, an intern rewrote the entire visualizer in Rust. And everybody was really happy because it made the application a lot faster. And because the code was a lot easier to extend, people started contributing a lot more detailed visualizations and it helped us fix a bunch of bugs. And it turned out that this was the intern's first project in Rust. But that's not the only time that that has happened. We've had a lot of new hires that didn't know Rust before they started and things seem to be going pretty well. By comparison, I learned Rust all by myself and it seemed fairly difficult. For me. I think the difference is, if you have somebody nearby to talk to, to look at your code, to give advice, I think that really seems to speed things up a lot. So really having a mentor is a big help. It also helps that once you have a large group of developers, they don't all need to know how to do the tricky things. They don't all need to know how to write unsafe code. They don't all need to know how to do FFI. They can kind of specialize a little bit. So that helps. They can grow their Rust knowledge at their own pace, and they don't really have to feel pressured to know every single detail. But I was also surprised, you know, my background is sort of systems languages, C and C++. I was really surprised, too, that that sort of a background doesn't seem that important to being able to learn Rust. I would have expected that, geez, if you've never used pointers or memory management before, you might have a hard time. But that really doesn't seem to be a big issue. I have a 12-year-old who's a pretty decent Python programmer, and he's never learned C. So maybe one day soon, I'll see if he wants to learn Rust, and then I'll really get to the bottom of this.

Matthias

01:12:49

Does the Rust learning experience depend on the person's previous experience with other languages?

Eric

01:12:56

I would have thought so, and I'm sure every person's experience is a little different. but in terms of, is it a lot easier or a lot harder? If you learned these languages previously, that doesn't seem to be the case in our experience. I think the hardest part of learning Rust for me, I really wanted to understand what idiomatic Rust was. I wanted to know how do experts write Rust code? And that's something that's really hard to learn, I think. And particularly, you know, how do you capture complicated ideas and make them look simple? People who are really good at Rust have a talent for, I guess, making their code really expressive and friendly. And I think mostly I learned a lot by watching John Jengset YouTube videos. If you haven't seen them, he does these three and four and five hour long videos where he tackles some big giant project. And he's really good at Rust code. So you get to watch an expert and you get to watch him make mistakes and figure out the problem and move on. And having watched that, I think it's really helpful because you learn a lot about how people mentally construct their code while they're writing it.

Matthias

01:14:16

John is really a great person. I can relate to that. And he's also just a great person in and of itself, not only a great rustation, but also someone that is very happy to share their knowledge. And it's much appreciated to have such a person in the community. Are there any book resources or other non-video resources that you would recommend for learning idiomatic rust? or is that another gap in the ecosystem right now that should be filled?

Eric

01:14:50

Yeah. So John's book is great, Rust for Rustations. I also, I have a great love for efficient data structures and lock-free data structures and algorithms. And so the book Rustatomics and Locks, I think is really great.

Matthias

01:15:12

By Mara Boss.

Eric

01:15:13

By Mara Boss. and I think that's a wonderful book as well. And what's funny is even though the Rust atomic semantics are basically identical to those in C++, I have never seen a C++ resource that explains that memory model as well as Morrow's book. So I think it's a great resource even for people who don't necessarily care that much about Rust.

Matthias

01:15:38

Idiomatic code is also very close to my heart. And I do believe that we need more resources in order to convey how to write both maintainable, ergonomic, but also reliable code. And I do believe that if I understood you correctly you we are on the same page here and we think about systems a lot and how the small parts interact with each other and how to build something that is robust and reliable with a language like Rust and Rust lends itself to these sorts of problems how do you learn about reliable systems are there any resources out there maybe even outside of Rust that you used to learn about how to build such environments, such applications?

Eric

01:16:24

I think I mostly just learned by seeing projects go right and go wrong. My opinions about Rust are really opinions about where software engineering is, and that's mostly based on the projects I've seen in the last 25 years. When I was at Pure Storage, I got to witness a startup up grow through two orders of magnitude. And that company was successful because they found a way to build really high quality code. They sell data storage appliances with incredible uptime and reliability. And doing that needs a lot of things to go right. And the really cool part is that when things are right, the day-to-day experience of being a software engineer starts to change. You spend a lot less time chasing dumb stuff and you get to spend a lot more time on the really unusual failures have you ever heard the phrase horses not zebras.

Matthias

01:17:20

No, I have not.

Eric

01:17:22

This is a bit of a tangent, but it comes from medical doctors. So a medical student is training to be a doctor. They learn about all of this cutting edge research and exotic diseases, and they go in to meet patients. And there's a tendency to think that every patient has some exotic disease. And it's so much more likely that they have something common and boring. So the advice they give to medical students is when you hear hoofbeats, think horses, not zebras. Always go for the easy stuff first. But in software, it sometimes goes differently. If you have a software project, once you've put in the effort to drive out all of the really common bugs, then the only thing that's left are the zebras, the really interesting, surprising bugs. Bugs, you start to see strange failures from all over the place. You can see design mistakes in the hardware. You can see like bad CPU microcode. You might start to see kernel bugs that nobody's ever identified before or compiler errors that fail incredibly rarely. And that is a really interesting job to have. You get to spend all day being creative, trying to imagine how something could possibly have happened. And it's really a lot of fun. And how do you get to this really high level quality, I think the discipline is just to always be looking for better tools. And sometimes these tools just come along and fall in your lap. And sometimes you have to build them yourself. Every large software project has built some of their own tools. But I think it's important to always be thinking, what could I build that would fix not only this bug, but every other bug in this category. And those category solutions are really hard and expensive to build, but they're incredibly satisfying and worthwhile once you have them.

Matthias

01:19:18

Is there a good example for such a product that you have in mind?

Eric

01:19:22

If you find yourself running out of memory a lot, One time I had a coworker who wrote a tool to help profile all of the memory that had ever been allocated and tell you where it went. And once I saw that, I realized I want this in every project. The fact that memory allocations are kind of mysterious data that nobody gets to see seems wrong. And so given a day or two, you can build your own memory allocator and print out some data about who the caller was and where the memory went. And you can learn a lot of interesting things. And I think the ultimate is there's this blog post out there by Will Wilson, who's the CEO of Antithesis, and he describes the situation where the developers of FoundationDB, they had such a good test environment that their engineers were 50 times more effective. And it sounds like it must be an exaggeration, but I really do believe that it's true because it's a really incredible feeling when your tools start multiplying your productivity and you can do incredible things without wasting your time. And so that's how I come to Rust is that search for better tools and the desire to stop wasting my time.

Matthias

01:20:38

One traditional question that I ask every interviewee is what would be your message to the community?

Eric

01:20:46

I think my message is, thank you. I'm very grateful to have better tools. They make programming fun again. I think things in the Rust world are going really well. There are always things I'd like to see get done faster, but I try to be patient because I understand that sometimes people need time to think and make good decisions. and open source developer burnout is a real problem. So I don't want to add to the pressure in any way. I'm amazed that the Rust toolchain releases every six weeks and it's so reliable. I usually upgrade our toolchain at Matic the week it comes out. I've been doing that for more than a year and we've never had to revert to an older version. So it's great. We don't need to stress about the quality we get to play with all the new toys. Everybody's happy.

Matthias

01:21:36

And for other device vendors?

Eric

01:21:39

I think the world has a long way to go in terms of software security. Almost everything is terrible. There are maybe five or 10 companies in the world that can make secure consumer devices, and even they have problems from time to time. Everybody else is just broken constantly. And there's a lot of reasons for that, default passwords and directory traversals and so on. But I think it's really hard to get any traction on the problem when there's this constant undertow of memory unsafety. You know, you send a big packet and you can cause the server to crash. You can send a malformed URL and get a remote code execution. It still feels like we're kind of living in the dark ages. and I really have to wonder if you built these devices with rust everywhere would it start to get better? I think in the next 10 years we're going to start to find out and the other problem is a lot of hardware is built by companies that just want it to be as cheap as possible they just want to grab whatever open source stack is out there and ship it and never care about the future and they don't ship security updates either so that's a really hard problem to resolve but I think it's at least possible for that cheap open source stack to reduce the security problems by a lot and then maybe it's at least possible for cheap devices to stay secure for at least a few years but it's a really interesting problem that's.

Matthias

01:23:07

A great testament to Rust to the community and thanks for being a guest today.

Eric

01:23:12

Thanks so much for having me on the show.

Matthias

01:23:16

Rust in Production is a podcast by corrode. It is hosted by me, Matthias Endler, and produced by Simon Brüggen. For show notes, transcripts, and to learn more about how we can help your company make the most of Rust, visit corrode.dev. Thanks for listening to Rust in Production.

2025 - Matthias Endler

Hosted by LetsCast.fm
Create your own podcast