Rust in Production

Matthias Endler

Scythe with Andrew Tinka

About grassroot robotics with Rust

2025-10-16 59 min

Description & Show Notes

Building autonomous robots that operate safely in the real world is one of the most challenging engineering problems today. When those robots carry sharp blades and work around people, the margin for error is razor-thin.

In this episode, we talk to Andrew Tinka from Scythe Robotics about how they use Rust to build autonomous electric mowers for commercial landscaping. We discuss the unique challenges of robotics software, why Rust is an ideal choice for cutting-edge safety-critical systems, and what it takes to keep autonomous machines running smoothly in the field.

About Scythe Robotics

Scythe Robotics is building autonomous electric mowers for commercial landscaping. Their machines combine advanced sensors, computer vision, and sophisticated path planning to autonomously trim large outdoor spaces while ensuring safety around people and obstacles. By leveraging Rust throughout their software stack, Scythe achieves the reliability and safety guarantees required for autonomous systems breaking new ground in uncontrolled environments. The company is headquartered in Colorado and is reshaping how commercial properties are maintained.

About Andrew Tinka

Andrew is the Director of Software Engineering at Scythe Robotics, where he drives the development of autonomous systems that power their robotic mowers. He specializes in planning and control for large fleets of mobile robots, with over a decade of experience in multi-agent planning technologies that helped pave the way at Amazon Robotics. Andrew has cultivated deep expertise in building safety-critical software for real-world robotics applications and is passionate about using Rust to create reliable, performant systems. His work covers everything from low-level embedded systems to high-level planning algorithms.

Links From The Episode


Official Links

Transcript

It's Rust in Production, a podcast about companies who use Rust to shape the future of infrastructure. My name is Matthias Endler from Corode, and today I'm talking to Andrew Tinka from Scythe about grassroots robotics with Rust. Andrew, thanks so much for taking the time today for the interview. Can you please say a few words about yourself and about Scythe?
Andrew
00:00:27
Absolutely. My name is Andrew Tinka. I'm the Director of Software Engineering at Scythe Robotics. Scythe designs and manufactures professional-grade autonomous lawnmowers for the commercial landscaping industry. So when I say autonomous lawnmowers, many people think of a Roomba for your front lawn. What we build is larger and more heavy-duty. It is a 1,400-pound machine, roughly 650 kilograms. It is a fully capable stand-on lawnmower. So you can watch crews use this machine as a fully capable electric manual lawnmower, or they can put it to work to autonomously mow large areas. The value that Scythe brings is to... Increase the efficiency of taking care of green spaces, parks, athletic fields, corporate campuses, large areas of grass where mowing it all manually is not a great use of people's time. Ideally, a landscaping crew would get more done with the same number of people because while the lawn mowing is happening, the rest of the crew is handling the string trimming and the weeding and all of the other more manually involved jobs around the property, while the bulk of the just mowing the grass is getting done by a robot.
Matthias
00:01:53
That means it's very much a heavy-duty machine and it's for industry use.
Andrew
00:01:58
Absolutely. Our ideal customer is a commercial landscaper who is serving a lot of contracts. And so they're mowing 40 hours a week. They're loading these machines up and going from one property to another, taking care of a park in the morning, taking care of a big field in the afternoon, and doing this work every day throughout the week.
Matthias
00:02:20
Yeah. You get really good value out of it when you use that machine as often as possible, ideally throughout the workday. throughout the week. And for how long has Scythe been in existence?
Andrew
00:02:34
Scythe got started in 2018 as a classic garage startup. Just a few engineers in a small space putting together the first prototype. And so since then, we have gone through several successive generations of our hardware, and our fleet has grown every year, and the number of customers we've served has been growing every years since 2018.
Matthias
00:02:56
And was the idea always to have an automated robot? Was that the idea?
Andrew
00:03:01
Absolutely. The founding principle of the company was to use robotics to help take care of the earth better. And so there's a couple of ways in which that decision played out. The first one was to find the activities that people do to take care of green spaces and see which ones are a place where a robot can bring value. And the second aspect of that decision is our commitment to build all electric machines. Gas-powered landscaping equipment is often uniquely polluting even for internal combustion engines. And so bringing a valid replacement to the market to allow these kind of operations to be done with electric machines instead of internal combustion engines is a real positive contribution.
Matthias
00:03:57
But in my mind, it also sounds very daunting to start such an endeavor, because you deal with a lot of moving parts, you deal with potentially safety-critical systems, you want to make sure not to harm anyone, nor the environment. How do you navigate that space?
Andrew
00:04:18
Yeah, the design of the machine is a big part of it. I mentioned earlier that it's a fully capable manual machine. It has manual controls. You can jump on board and start mowing the grass yourself. And that was a really clever strategy to allow us to approach the autonomy problem incrementally. We could, in our early stages, as our autonomy stack was maturing, essentially see it as a mixed autonomy problem where the robot would handle the big, bulky, easy parts of the job, just the center of the field where you just have to go back and forth a bunch of times. And the parts of the mowing job that are more difficult, the edges, the obstacles, crinkly bits around the corners, we could say, all right, well, this is a place where humans do a better job. And so humans can take over and mow those areas. And that gives us kind of a continuous space to refine our autonomy. We start out saying, okay, we take the easy middle. And as we got better every year, we start saying, okay, actually, we can take over more and more of these tricky bits around the edges. The company is based in Colorado, which means there's a lot of skiers on the team. And so we have a skiing-based metaphor for difficulty. We can look at a field and say, oh, this is a green circle, or this is a blue square, or this is a black diamond. So we we talk about our our progress on autonomy and saying okay this year we moved from doing 80 of the the blue squares to 95 of the blue squares that's that's a big step forward in our autonomy and so we're chipping away at the gradient of of autonomous difficulty one step at a time would.
Matthias
00:06:01
You describe scythe as a software company a hardware company a robotics company or something else entirely.
Andrew
00:06:07
I think it is a robotics company in that the hardware and the software teams both have an essential contribution and they have to pull together the the designs are are closely coupled the software we build relies on the information that we gather from the sensors that the hardware team chose to to install on the machine and so when the hardware team designs the next generation of a robot the software team gets consulted to say okay well what what areas are are most difficult for this generation how can the next generation make the problem different the software problem different so that it is more amenable to to be solved the. The nature of design cycles does give it this back and forth character. Hardware designs on annual cycle, if you're lucky, if you're lucky, you can manage a one-year turnaround on your hardware designs. We work very, very hard to essentially get that one-year turnaround on our hardware designs. Whereas software is so much more protean. It moves so much faster. We can release software on timescales of weeks as opposed to years. And so that makes the software design more adaptable and more flexible and allows us to be closer to an agile kind of methodology for our work than the hardware team has to. The hardware team must follow a more solid and heavyweight design process because the nature of their design commitments are so much higher impact.
Matthias
00:07:57
Yeah. And you said it yourself, that hardware is in the field, like literally in the field for a very long time, if you allow me the pun. That means you need to select that hardware really carefully. What's inside of these machines? What keeps them running? Speaking of the CPU, the sensors, everything that makes it tick.
Andrew
00:08:20
Most of our computation happens on embedded NVIDIA module, a Jetson module, which is a ARM CPU with a GPU. They share the same memory space, which is a very convenient architectural feature for handing data over between the GPU and the CPU. The rest of the embedded system are in-house designed PCBs, so different PCBs to control the motors, to control the various sensors and other peripherals. And each one has a microcontroller of some sort to govern a small selection of the hardware. The the battery is probably one of our definitely our heaviest component and and one of our one of our most expensive and carefully designed components because the choosing the battery defines the the runtime that you get in the field and that's extremely important to professional landscapers they want to be able to use this machine for many many hours during the day, the the motors the the chassis the the motors are are are sourced components that uh that that we we choose from carefully selected vendors the chassis and all of the the the physical instantiation of the machine that is all scythe designed and small run manufactured steel components that are that are produced on the on the production runs that we do wow.
Matthias
00:09:54
That's a lot of moving parts Yeah. About the Jetson component, is that a thing that even has Rust support, or would that be driven by, say, a C or C++ component, or is there some other way to drive it?
Andrew
00:10:12
Definitely the ecosystem's center of gravity is definitely C++. By default, you expect that an embedded component was, but the vendor provides... All of the vendor-supported files, all of the board support packages, all of the drivers are in C++ by default. And these are the boundaries where we have to do C++ to Rust interop. We basically expect that there's going to be a C++ driver kind of making the last connection to our peripherals or the video system or any other component on the board. And that the data will cross the border over from C++ into Rust where we do our business logic.
Matthias
00:11:04
Yeah, but to take a step back, just for context, did you start with a hybrid C++ and Rust codebase or did you start with, say, an existing C++ codebase and gradually oxidized it?
Andrew
00:11:17
Yes, it was a sudden and sharp decision. So it's a fun story. I think it actually is a story that really illustrates how your architecture choices are driven by what's most important to you at the time. So in 2018, when the company came together as a small number of engineers in a garage, the priority was to get something working to show to potential investors as quickly as possible. And the robotics C++ ecosystem is very deep. And so the right choice in early 2018 was to put together a demo-worthy prototype in all C++, reusing as much open source code as possible, going quick, dirty, and scrappy, and making a machine that moved. The company got its first seed funding in late 2018, and that was the moment when their design priorities changed, and with it, their software ecosystem changed all at once. They essentially threw everything out, everything that they had built, just in the garbage, blank page, starting from scratch. And their most important decision was to build something that would last, that would serve the company throughout its entire lifetime. And that was the point where they said that Rust was the best choice to base as much of the robot logic as possible. That Rust provided the kind of code quality that they were looking for, and along with the kind of performance and efficiency that they needed. And even at that point, so late 2018, early 2019, I'm saying they, of course, because I wasn't with the company at the time. These are the founders and the first engineers who were building something from scratch. They saw that the state of Rust to C++ interoperability was sufficiently mature and reliable, that they were confident that they could use whatever C++ component they needed on the periphery of the system, literally for peripherals, or to engage with the rest of the robot operating system, the Rust ecosystem. And that they were confident that they could make any Rust code they needed work inside that ecosystem.
Matthias
00:13:40
I find that so cool and I find it so inspiring because as soon as the developers got funding, they moved over or they looked for greener pastures on the Rust side, maybe, to build the thing right. But immediately someone might say, well, couldn't you do the same thing with C++? Because there are also long-term projects written in C++ and we use them every day. couldn't you just keep on building on top of c++ and not throw the code away that would have saved you time at least initially.
Andrew
00:14:15
Absolutely there are there are many successful robotics companies out there that are almost exclusively c++ i think that many of the the the those first engineers many of those first engineers had spent a lot of time writing c++ and had spent a lot of time getting burned by the same problems over and over. The same relatively simple mistakes that turn into an incredibly difficult to diagnose bug that takes a long, long time to root cause and find the original cause. And everyone who was in those decisions, All of the early engineers, all of the founders talk about a desire for code quality, a desire to be able to write confidently, knowing that what they were writing would not burn them in one of these almost stereotypical ways, one of these classic mistakes which leads to trouble down the road.
Matthias
00:15:24
Okay that means they were veterans c++ developers who had a background in the language they knew what they were doing they were able to cobble together a prototype to convince venture capitalists to invest so that's a really positive sign and yet those people even with their c++ experience, preferred to write newer components in rust.
Andrew
00:15:49
Absolutely and they would each of them would would would defend that decision passionately that and by staying with the company and by it by continuing to develop in this uh in in in this mixed style of of rust inside a a world of c plus they basically uh. Voted with their feet for the next year, for all the years they stayed with the company and continued to develop in that way. From my perspective, I joined the company two and a half years ago. And I joined not knowing Rust. I joined thinking that one day I would like to learn Rust. And so I was open to the idea. And when I was hired by Scythe, they essentially said, welcome to Scythe. We understand you don't know Rust. Very few people who join our company do. Here's your desk. Here's a handful of web links for resources where you can learn. We look forward to your first merge request. And that's the pattern that most software engineers who join Scythe follow. Almost none of our software engineers know about what they join. They almost all are robotics domain experts with experience in the robotics industry, typically in C++, although sometimes in other languages. And they join us with a willingness to learn. And there's an almost stereotypical pattern of joining the company, banging your head against the borrow checker, learning how to get past that, writing your first MR, your first merge request, and then going through the same sort of rust idiom conversations that everyone needs to go through when they're getting started. It's almost a predictable pattern of what your first MR will look like and the first style conversations we'll have. And then after a few iterations of feedback, people have made the transition. They're Rust developers now and they don't look back.
Matthias
00:17:50
I can relate to that. What were your main programming languages before you joined Scythe?
Andrew
00:17:56
I had done C and C++, particularly in my distant past in grad school. But a slightly idiosyncratic part of my journey is that my most recent job had been 10 years at a company that used Java for robotics. And so I was already coming from a slightly non-standard language. And that maybe contributed to my willingness to pull up stakes and head it over to Rust instead.
Matthias
00:18:22
And what were some of those stylistic, eureka moments that you had, given your background?
Andrew
00:18:28
One of the design patterns that happens in Java a lot, and it happens in other languages too, but Java particularly, is using an immutable data structure in order to signal that there is some need to keep data coherent. So a structure has a collection of fields, and it's important for those fields to stay consistent with each other. Messing with just one of them is likely to cause some sort of inconsistency or bug. And so in Java, you declare everything final, you don't give setters, and that's a clear signal that this data should be treated as a package. Unfortunately, when you do need to make changes, you often are making a lot of copies of this same data just so that you're allowed to make the legitimate changes you need to make through construction of a new copy. But I came to Scythe sort of with this pattern well ingrained in me saying to myself, when in doubt, a data structure should be immutable. You should have a good reason to be able to mess with it. And that was one of the early features of Rust that drew me in, because clearly these ideas of when you're allowed to change data versus when you have a shared reference and you basically can't mutate it, it's, That's the first thing that people notice when they start writing Rust. And then I noticed, oh, well, this language, this expressivity of saying, oh, I have a shared reference or an exclusive reference to this structure. I'm allowed to mutate it. Now I'm not. I'm allowed to take a shared reference and mutate this data without making a copy of it. And when I lose scope, when my exclusive reference goes out of scope, I don't have mutable access to this structure anymore. Suddenly, it was like the next level of this mutability-immutability concept that I've been working with. You could efficiently make changes when you need to in a way that very clearly signaled that you were the only one allowed to make these changes. And then give that access back and be confident that the data wouldn't change when you weren't looking at it.
Matthias
00:20:45
Okay. What I hear from you is that you value immutability now and explicitness. The notion that if you return a thing that is immutable, it will stay immutable. And also you do that explicitly. So you pass it back and then someone else can work with the data, but not really mutate it.
Andrew
00:21:04
I guess- This might be unique to robotics or not unique to robotics, but might be particular to robotics, that quite often you are working with some fact about the world, which is multidimensional. All of these things are true right now at this instant. And it's not really valid to talk about keeping 80% of these facts, but not 100% of the facts. I've made a package describing the world. You need to make your decision based on this package of information, this struct. Please do not take this struct apart and change any of the pieces because it's only true when it's all together that that kind of multi-dimensionality truth is i think a robotic specific feature.
Matthias
00:21:50
It reminds me of a talk that I saw the other day by Jon Gjengset about a type-safe spatial math library in Rust called Sguaba. We will link to it in the show notes. It is for locations of objects in space. And it feels like a lot of people use Rust for that specific purpose because of its safety guarantees, because of its expressive type system. It feels like you're kind of agreeing with this and you're also working towards that.
Andrew
00:22:22
Absolutely. And I think when you say the expressive type system, that is the door that opened up to the Rust way of seeing the world that I didn't know about when I joined Scythe. As a complete Rust novice, I did not see the value of an expressive type system.
Matthias
00:22:45
Now, I'm not a Java developer, but to me it feels like Java also has a very expressive type system. Couldn't you encode the same invariance with Java?
Andrew
00:22:55
It's absolutely true that Java has an expressive type system that can be used in a lot of different ways. I think Rust's decision to essentially not do inheritance or not make inheritance easy. Leads to the Rust-type system being used in different ways to much greater effect. The Java capability of classes inheriting from each other leads to a lot of elaboration and customization of a class. Oh, I want this, but I want it to act a little bit differently, So I'm going to inherit from it and make a few changes. For me, the first thing that I think about when I think about the Rust expressive type system is the expressive enums and data carrying variants of enums. And so Rust encourages you not to build deep, deep hierarchies of classes inheriting from each other and making everything more complicated as you go, until sometimes you don't quite understand what's happening at the bottom of this long chain of inheritance. Instead, Rust encourages you to make a single layer of hierarchy, a single enum with all of the possibilities enumerated alongside each other in parallel. And so you have a lot of expressivity and you can use it for a lot of things. But it doesn't get so deep so as to be obscure. You can usually trust that you only need to look one place in your Rust code base to understand the range of options which is available to you.
Matthias
00:24:49
Yeah. It almost feels like the Java ecosystem encourages inheritance.
Andrew
00:24:56
Oh, absolutely. And I think it's fair to say that Java developed to take maximum advantage of inheritance before there was a language design pushback to say maybe inheritance is making things more complicated than it's worth. And I think you can see that those design ideas coming out a little bit after Java's maximum flourishing and growth. When Java settled down and became a stable sort of industry background choice, after that there kind of was a counterreaction to inheritance being used as the design feature to solve all design problems.
Matthias
00:25:49
You came to scythe with all of that background and knowing that there might be some work on on the rust side maybe with hardware that maybe is not familiar to you sensors and so on it sounds like a very daunting challenge and you still signed with them, Knowing all of that, what was your thought process back then? Were you curious about what it was all about to work on that level? Were you curious about the project? What was the main driver for you?
Andrew
00:26:23
I was excited to learn. It was absolutely a daunting challenge for all the reasons you're describing. I knew that I was signing up to learn some new skills. I had been working with Java for 10 years. That's a point in your career where you say, okay, maybe this is where I've settled. Maybe this is who I am and this is my core competency. It, it felt adventurous. It felt like a bit like, a bit like jumping off a cliff, but I'm glad to say that I was jumping off the cliff into, into a wonderful, warm swimming pool. And it was, and the water was fine. And I greatly enjoyed the transition. Um, the, the. It was a chance to learn new things and to pick up old problems with new tools. The robotic algorithmic challenges that Scythe is facing is the same as the algorithmic challenges faced by many mobile robotics applications. You need to plan paths. You need to verify that your trajectories have no collisions. You need to sequence a collection of work in a schedule that makes sense. These are all problems that many robotics companies need to solve. And they always have to solve them using custom approaches. It's rare to get a clean solution that solves the entire class of path planning, for example. Domain specificity is almost always a feature of the problems faced in robotics. You really need to grapple with the very specific features of your application. Oh, I have a lawnmower. That means I'm trying to cover all the grass. I'm not interested in finding the shortest path from point A to point B. I'm interested in finding the path from point A to point B that covers the lawn, that gets all of the grass mowed. These kind of domain-specific features mean that a lot of implementations are always going to be in-house and whether you've chosen the right tool for the job or not changes your quality of life as a developer. I'll bring it back and I'll say that Rust has allowed us to solve many of these classic problems in our own way using code that that we trust that we got right the first time mm-hmm.
Matthias
00:28:52
That means you write it, you build it, and it runs more or less indefinitely without any problems?
Andrew
00:29:00
The joke is, if it compiles, it works. And I don't believe that. You can always make mistakes. But your mistakes will be like logic errors, not safety errors. Testing is a huge investment for us. Every feature we write, every piece of code we write, needs to be tested thoroughly in order to be trusted to work in the autonomous space. To move a 1,400-pound machine with spinning blades through space is a daunting challenge. We take it very seriously to verify that the code we write does the job. Every test is an investment. Every test has a cost. It is incredibly powerful to say that an entire class of bugs has been excluded through what the compiler does for you. You don't have to worry about running it for hours and hours and hours just to make sure you don't leak memory. The compiler did that. We need to worry about whether the robot turns right instead of turning left. That's the kind of mistake that the compiler won't catch for us. But fundamentals of the stability of the the embedded compute those are largely handled and we don't need to invest we don't need to focus our testing attention there.
Matthias
00:30:22
That's incredible to see tests as an investment this is what it should be because an investment has a potential payoff in the future it's.
Andrew
00:30:31
Absolutely and there's a there's a maturity process as your technology evolves, that in the early days, Testing is easy and a little bit discouraging because your robot runs successfully for five minutes and then encounters a fatal error. And so your cycles can be very, very fast. And relatively speaking, your testing investment is relatively light. Five minutes of work gets you a new bug and off you go and you've got something to work on that day. As your technology matures, as you dial your system in, suddenly you have to test for long, long periods of time before you find something actionable. And so you're putting in hours and hours and hours in the field to find one piece of information which you can take back. And maybe it's no longer a fatal error. Maybe we're not even talking about the robot stopping. Maybe we're just talking about the robot doing something that we would prefer it not do. And so the amount of time you need to put in to find, oh, statistically, we're not quite hitting all of these cases exactly the way we would like. We would rather lift this percentage from 40% up to 70%, please. But it took us 200 hours of testing to gather that information. You know you're doing your job right when your testing budget starts getting very, very large. because you need so much time to gain statistical power on the features you're trying to investigate.
Matthias
00:32:00
Yeah. There's always this point in every project where you make a change and it breaks and then you take a step back and you realize, no, in fact, the system prevented a bug here and the system is more robust than I thought it was already. It's always very enlightening.
Andrew
00:32:22
And that is somewhere where Rust shines. The general Rust pattern of non-exclusive coverage being a compile time error saves so much time and effort. I think this is another one of those patterns, which are particular to robotics, is highly coupled state across components. We try to take our complete problem, make this robot drive autonomously. When we try to break it down, we try to decompose it into components in a stack. Each component is responsible for an aspect of the decision making. And ideally we would hope that those those components were very cleanly separated and the information they they share between them is extremely limited the reality is that you're always fighting against the tendency to to couple your data across your components severely that data is useful it's very important information to know what kind of mission you're doing so a task planning level concern. It's very useful to have that information down when you're deciding how the robot should move through space when you're making a trajectory decision. And so a large robotics code base tends to have long range coupling across components, even though architecturally we're fighting against that as hard as we can. But those couplings still exist. It's incredibly valuable that when you make a change in a data structure in one part of the system, the compiler catches the 10 errors you didn't think of because you had those long range couplings and now your trajectory planner needs to be rewritten to handle the change that you made to the task planner's data structure. If your data structure has changed and you haven't covered all the cases, it's an error, catches so many of these long range dependencies that otherwise would be difficult, that would be runtime errors instead of compile time.
Matthias
00:34:25
Now, wouldn't that be a property of a static type system, though?
Andrew
00:34:29
Honestly, I think it's more about idiom and design philosophy. So, for example, if you define a struct in Rust and you add a field, If somebody tries to construct that struct and they haven't provided that last new field because they didn't know about it, because this is one of those long range dependencies we were talking about. That is a compile time error unless you used the default annotation and unless you said that it's possible for fields to be filled in by default. So in Rust, it's possible to kind of remove this safety feature that if you haven't specified every field of your struct, you can't build it. But as the idiom if you say to yourself i would rather not use default when i when i don't have to because i kind of want to keep this feature of of like if i if i've forgotten one of my fields i want to know about it then uh then then you have that property it's this this example of of all fields being necessary is is just one of the of these kind of non-exhaustive non-exhaustive coverage means error patterns another one is is match statements or or pattern matching in general if your pattern matching isn't exhaustive rust will tell you about it that isn't true in other languages.
Matthias
00:36:03
Yeah yeah you could still use an underscore for a match case. But If i understand you correctly it's sort of an anti-pattern in a larger application that favors correctness.
Andrew
00:36:18
Absolutely and so it is fair to say that that that Rust doesn't doesn't do all of the work here for you that that like like you said there are there are there are ways to to have defaults or to to have the underscore match and they are it almost feels like like it's it's context whether it's appropriate or inappropriate there's plenty of times when it's fine to use underscore or to catch all the remaining cases in the match. That's fine. That becomes kind of company style or company culture almost where you are encouraged or discouraged from using these kind of patterns.
Matthias
00:36:59
Yeah. Do you have a coding guideline at Scythe? And how does the review process look like? Do you look at such patterns, tell people about it and tell them why it's a bad thing?
Andrew
00:37:11
That would be an aspect of our growth that I'd love to see. We tend to hold these kinds of styles as more of a culture than an explicit style guide. Clippy gets us halfway there, by the way. Clippy has so many requirements, so many good requirements, so many solid requirements that are enforced in our tool chain. So it's not true that it's completely a free-for-all. It is more about our code review practices that we say to ourselves, okay, when we review code, we all know that we prefer to see things written this way. We never use any of the functions which can plausibly panic. Don't unwrap. Unwrap or else. Those sort of choices. There are code bases, there are circumstances where panicking under unforeseen conditions is entirely the appropriate thing to do. We prefer not to do it because we would rather not panic our robot process as it is trying to decide whether to go left or right.
Matthias
00:38:22
Earlier, you said that the times to find a bug tend to become longer and longer, the more robust the system becomes. What are some of the problems that you find, if you allow me to put in the field with that robot? It's probably, is it more of a business logic problem or a systems problem or a domain problem? What is it that you find?
Andrew
00:38:48
We are always striving to to improve the reliability of the of the machine expressed as the the number of interventions that that you need to make when the the the robot is in the field mowing and that's a function of the the difficulty of the the the job that you're you're doing i mentioned before about the progression from up the ski slope of of difficulty as we get better, our customers trust us and put our robot in more and more difficult situations. So that's the heart of our defect discovery through testing. We say to ourselves, we know that our customers this year are putting our robot through more difficult scenarios. Let's say, for example, with respect to slopes. Sloped fields are more difficult to do robot control in than nice flat green fields. And so we'll say to ourselves, okay, we are going to ensure that our selection of testing fields, the grass we mow in order to test, we're going to make sure it includes a lot of slopes. And so we're going to make our testing problem more difficult for ourselves. Okay, now we measure the number of times when we. When our robot trajectory goes too far away from nominal as you're trying to to turn on a slope and we say to ourselves well we want to bring that we we want to bring these number of defects down they may not even be defects that require any intervention and they certainly aren't aren't defects that lead to any kind of of safety incident it's just we want the robot to to track more solidly on on hills than it is already and we call it we call it a defect when when it exceeds these bounds. And okay, so we're now having one of these kind of trajectory defects every five hours of operation. Let's get that to one every 10 hours of operation. That's the kind of defect that we're chasing when we're on a mature autonomous robotic system. And it doesn't quite even fit into the rubric of code correctness anymore. Now this is more like capabilities and tuning. So we say, okay, we are going to. We are going to increase this parameter in order to emphasize tracking to the desired trajectory more solidly. Well, that has consequences. That leads to a probabilistic defect in other parts of the system because the robot is now tracking more or is more solidly tracking its desired trajectory. That means in a different scenario than the robots on a slope, when the robot is navigating around an unexpected obstacle, something different happens with its performance. And now we're in a trade-off space. Do we want to change our defects on slopes or do we want to change our defects when navigating around obstacles? And now it's a whole conversation about what's more important to us and what the customer will value more in terms of a reliable machine.
Matthias
00:41:46
I watched a talk from the Oxidize conference from 2024 recently and in there there was a company which described a navigation or a planning system for autonomous robots in a warehouse scenario where they had multiple small robots and they would find a common trajectory for all of them they used a system for deterministic testing they had an event log on each machine and And then they could replay the exact scenario that happened, which caused a deadlock. Is that a thing that you can apply to your domain as well?
Andrew
00:42:26
I love determinism so much. It is such a precious jewel to gain it when you have it. And it is so hard to achieve. That is an incredible investment which yields incredible rewards in robotics. I'm sorry to say that most of our systems do not have that fundamental determinism. First of all, the real world doesn't bring that kind of determinism. The variations in timing of events in the real world, the variations of how a wheeled robot on a slope behaves means that every real world scenario is a unique and unreproducible event. Even in purely simulation-based testing, determinism and reproducibility is very, very hard to achieve. We haven't achieved that in general on all of our simulators. There are many smaller components, unit testing and integration testing of isolated components, where we require and where we achieve that kind of determinism. Because it is an enormous asset in root-causing your problems. When you have a fully reproducible example that you are confident you can get back anytime you need. Eventually, you can take that scenario and make it one of your integration tests and come back to it to confirm that you've fixed the bug and it's stayed fixed for as long as your system exists. So I respect it. It's the kind of system feature which is very valuable. It takes a huge investment to get there.
Matthias
00:44:10
Yeah and and even then you might not be 100% sure if it's worth it to pay to pay that price because you might as well work on features at the same time or maybe absolutely find another way how to test it in in the real world yeah.
Andrew
00:44:29
The the reality of the of a startup with a short funding timeframe and a lean team of engineers is that you really have to make careful trade-off choices about which features you're going to invest in. I've sung the praises of deterministic simulation, and I really believe that we may get there one day. It's not the best thing for us to build right now.
Matthias
00:45:03
Yeah. And even before you do that, you probably want to invest in a good failover system, which you already have in place. You briefly mentioned the disengagement scenario where for example you might see that there's some, fundamental condition which is incorrect in the system so like an invariant and then you you want someone to take over so there's probably a way for you to say okay we can't handle the situation right now stop the engine and wait for manual intervention this.
Andrew
00:45:38
Is one of the the application specific advantages of the lawn mowing application in particular. You don't have that freedom if you are building an autonomous car that's moving down the freeway at 100 kilometers an hour. You don't have the freedom to say, oh, this situation is off nominal. I will simply disengage. Your path to safety is complicated. When you are mowing a lawn, your path to safety is short and assured. You just have to stop. If your system is, if your conditions are sufficiently off nominal, bring the drive motors to a stop, bring the blades to a stop and ask for help. Everything will be okay. Nobody is depending on you to get to the side of the road or to bring other hazardous components to a safe state. It's an application where safety is still critical, but there is a simple story to how to make the system safe. So, yeah, just to summarize, a deterministic system in simulation is incredibly valuable, but it's not the most valuable thing for us to build right away. The startup world means making hard implementation trade-offs as to what will be the most valuable thing for your customers to build next. And so our investments have gone to other places.
Matthias
00:47:02
Which brings us to today. What is the current state of the code base? How much of it is in Rust? How much of it is in C++? Do you still use ROS and to what extent?
Andrew
00:47:14
Yes. So the ROS is the middleware that is the backbone of the system. ROS stands for robot operating system, But it's best understood as inter-process communication, along with an ecosystem of tools for simulation, for observability, for injection, and other capabilities. So ROS moves the data around our system. Most of our perception stack is in C++. The ability to work with GPUs and CUDA in Rust is maturing. And there are exciting projects right now. But over the history of Scythe's development, that has been similar where C++ has stayed very strong. So most of our perception stack is in C++. That information gets moved from the perception processes over to the autonomy processes using ROS as the middleware. Almost all of our autonomy code is in Rust. so that is every decision we make once we have assembled our perception of the world our our task planning we decide what we're deciding we're doing next at the high level our navigation are deciding where we're going to move and how we're going to get there and our trajectory and motor control so all the way down to deciding what torques should be applied to to the motors is is a decision being made in rust we favor an actor-based framework inside our Rust system to decompose the problem down into components that exchange messages as a way of prompting each other to make decisions or sharing information with each other. And so I've mentioned before that most of our software engineers join our company not knowing Rust. And they develop into application-level developers who write Rust to solve robot decision-making problems. There are also some engineers at our company who are deep Rust experts who showed up with very deep knowledge. And one of the parts of the system that they build, is the framework that supports these robot logic components. So the actor-based framework that I described. And so the actor-based system, its overall framework is written by our Rust systems experts and the actors themselves, the components that make the decisions are built by our robotics domain experts who are solid application-level Rust developers.
Matthias
00:50:01
Then that actor framework is not open source. That's probably in-house.
Andrew
00:50:06
It's an in-house actor framework. There are several open source actor frameworks in the Rust ecosystem right now. I think the choices that were available to us in 2019 when we were starting down our Rust journey were such that an in-house actor framework was the right choice for us. If we if scythe were being founded in a garage today in 2025 it's quite possible that many of these exclusively in-house components would instead be the open source rust crates because the ecosystem has has expanded and matured during the during the last six years.
Matthias
00:50:41
Yeah and if someone's listening and wondering what a modern active framework might look like in rust the one that i like a lot is Ractor we will link to it in the show notes it's i guess it leans into the Erlang model a bit and it has a lot of components in there that are really helpful for a production actor system for example uh, factory so being able to have multiple actors in a queuing system and supervision and things that you need once you run that thing at a larger scale but your actor framework specifically is that synchronous does it mean you you mostly write synchronous code where possible or do you also use tokio in combination with it.
Andrew
00:51:33
It's an asynchronous actor system but most of the time can we write synchronous code? So this is maybe one of those distinguishing characteristics between the deep rust expert and the robotics expert who's writing in rust today, is the comfort with asynchronous rust code. At the top level, our actor framework uses tokio to marshal all the actors together as asynchronous actors. But any asynchronous code can call synchronous code inside it. So the framework exposed to our robotics developers is handle message function call, which looks like a synchronous function. It is a synchronous function. It is being called from asynchronous code, but you don't need to worry about that. This is most often the right division of responsibilities. Asynchronous Rust code has some pitfalls. You need to be aware of cancel safety and many related issues in order to get the job done right. Many of the decision-making components in our collection of actors don't need to leverage the features that asynchronous Rust provides. Most of them can be thought of as a message handling system that gives quick answers to each incoming message. And you might as well write that as a piece of synchronous code and not concern yourself with the possible pitfalls of asynchronous Rust. I'm very glad that the folks on our team who built the actor framework are well-versed in all the pitfalls of asynchronous code. But it's a convenient decoupling of responsibilities to say that the asynchronous code exists only at the actor framework. And most individual actors are just synchronous.
Matthias
00:53:42
Mm-hmm. Gotta say, that was great foresight from the people that worked on the framework, because you don't have to deal with the complexities that you mentioned in the async Rust ecosystem. You mentioned cancellation specifically. Do you have an example for when that becomes relevant in your domain?
Andrew
00:54:04
For this i think i really have to just defer to an excellent talk at RustConf in August on cancel safety given by Rain where they ran through so many examples of how cancel safety is important and how it can go wrong i guess the the easiest pitfall that i can just cite off the top is in an actor-based system where control and and essential information is arriving in messages probably the easiest way to mess up a cancel safety issue is to cancel and have the message disappear forever or the message that you were handling disappear forever if that that message was a critical piece of information which will only arrive once well then you probably have caused something a serious problem by by never looking at that message again so So... If your actors are receiving unique, irreplaceable, must-be-handled messages, then you better get your cancel safety right when you dequeue those messages and hand them off to your message handler. That's the easiest example I've got.
Matthias
00:55:12
Yeah. And that helps with determinism, too.
Andrew
00:55:17
Absolutely. Yes. For sure. So yes, to expand on that a little bit, one of our most common testing strategies is to take an actor and feed it a test benched set of messages in a particular order at particular times, and then make the test dependent on whether the actor gives the right answers back. So an actor can be fully deterministic. Given the identical sequence of incoming messages, the actor will always give you the correct output, or you strive for that. It's only in the glorious complex composition of all of your actors against a real-world system that is triggering events at uncertain times that you gain the kind of indeterminism that makes it a big complicated problem.
Matthias
00:56:10
We're getting close to the end and the final question in this podcast is always your message to the rust community.
Andrew
00:56:20
Absolutely stage is yours when i when i think when i think about rust and when i think about the open source community that that built it and maintains it and and moves it forward today it's my feeling is one of of of gratitude this is an incredible thing that a large diverse community of people have built and rust as a language sits at a very valuable point in the language design space, really is something special in terms of the guarantees it gives, the confidence you can have when building complicated systems with it, knowing that the language has struck such a valuable trade-off between competing concerns of safety and efficiency and expressiveness. It's a technology i'm glad to use every day that i use it and i hope that its development toward these goals continues as long as it can.
Matthias
00:57:31
Couldn't have said it better. Andrew thank you so much for taking the time today to do the interview thank.
Andrew
00:57:38
You very much Matthias. It's been a pleasure.
Matthias
00:57:39
Rust in Production is a podcast by corrode. It is hosted by me Matthias Endler and produced by Simon Brüggen. For show notes, transcripts, and to learn more about how we can help your company make the most of Rust, visit corrode.dev. Thanks for listening to Rust in Production.