Scythe with Andrew Tinka
About grassroot robotics with Rust
2025-10-16 59 min
Description & Show Notes
Building autonomous robots that operate safely in the real world is one of the most challenging engineering problems today. When those robots carry sharp blades and work around people, the margin for error is razor-thin.
In this episode, we talk to Andrew Tinka from Scythe Robotics about how they use Rust to build autonomous electric mowers for commercial landscaping. We discuss the unique challenges of robotics software, why Rust is an ideal choice for cutting-edge safety-critical systems, and what it takes to keep autonomous machines running smoothly in the field.
About Scythe Robotics
Scythe Robotics is building autonomous electric mowers for commercial landscaping. Their machines combine advanced sensors, computer vision, and sophisticated path planning to autonomously trim large outdoor spaces while ensuring safety around people and obstacles. By leveraging Rust throughout their software stack, Scythe achieves the reliability and safety guarantees required for autonomous systems breaking new ground in uncontrolled environments. The company is headquartered in Colorado and is reshaping how commercial properties are maintained.
About Andrew Tinka
Andrew is the Director of Software Engineering at Scythe Robotics, where he drives the development of autonomous systems that power their robotic mowers. He specializes in planning and control for large fleets of mobile robots, with over a decade of experience in multi-agent planning technologies that helped pave the way at Amazon Robotics. Andrew has cultivated deep expertise in building safety-critical software for real-world robotics applications and is passionate about using Rust to create reliable, performant systems. His work covers everything from low-level embedded systems to high-level planning algorithms.
Links From The Episode
- Ski trails rating - A difficulty rating system common in Colorado
- NVIDIA Jetson - Combined ARM CPU with a GPU for AI workloads at the heart of every Scythe robot
- The Rust Book: Variables and Mutability - Immutability is the default in Rust
- Jon Gjengset: Sguaba - A type safe spatial maths library
- The Rust Book: Inheritance as a Type System and as Code Sharing - Unlike Java, Rust doesn't have inheritance
- Using {..Default::default} when creating structs - The alternative is to initialize each field explicitly
- The Rust Book: Refutability - Rust tells you when you forgot something
- Clippy - Rust's official linter
- Deterministic fleet management for autonomous mobile robots using Rust - Andy Brinkmeyer from Arculus - 2024 Oxidize warehouse robot talk with deterministic testing
- ROS - The Robot Operating System
- Ractor - A good modern actor framework
- Rain: Cancelling Async Rust - RustConf 2025 talk with many examples of pitfalls
Official Links
Transcript
It's Rust in Production, a podcast about companies who use Rust to shape the
future of infrastructure.
My name is Matthias Endler from Corode, and today I'm talking to Andrew Tinka
from Scythe about grassroots robotics with Rust.
Andrew, thanks so much for taking the time today for the interview.
Can you please say a few words about yourself and about Scythe?
Absolutely. My name is Andrew Tinka. I'm the Director of Software Engineering at Scythe Robotics.
Scythe designs and manufactures professional-grade autonomous lawnmowers for
the commercial landscaping industry.
So when I say autonomous lawnmowers, many people think of a Roomba for your front lawn.
What we build is larger and more heavy-duty.
It is a 1,400-pound machine, roughly 650 kilograms. It is a fully capable stand-on lawnmower.
So you can watch crews use this machine as a fully capable electric manual lawnmower,
or they can put it to work to autonomously mow large areas.
The value that Scythe brings is to...
Increase the efficiency of taking care of green spaces, parks,
athletic fields, corporate campuses, large areas of grass where mowing it all
manually is not a great use of people's time.
Ideally, a landscaping crew would get more done with the same number of people
because while the lawn mowing is happening,
the rest of the crew is handling the string trimming and the weeding and all
of the other more manually involved jobs around the property,
while the bulk of the just mowing the grass is getting done by a robot.
That means it's very much a heavy-duty machine and it's for industry use.
Absolutely. Our ideal customer is a commercial landscaper who is serving a lot of contracts.
And so they're mowing 40 hours a week. They're loading these machines up and
going from one property to another, taking care of a park in the morning,
taking care of a big field in the afternoon, and doing this work every day throughout the week.
Yeah. You get really good value out of it when you use that machine as often
as possible, ideally throughout the workday. throughout the week.
And for how long has Scythe been in existence?
Scythe got started in 2018 as a classic garage startup.
Just a few engineers in a small space putting together the first prototype.
And so since then, we have gone through several successive generations of our
hardware, and our fleet has grown every year, and the number of customers we've
served has been growing every years since 2018.
And was the idea always to have an automated robot? Was that the idea?
Absolutely. The founding principle of the company was to use robotics to help
take care of the earth better.
And so there's a couple of ways in which that decision played out.
The first one was to find the activities that people do to take care of green
spaces and see which ones are a place where a robot can bring value.
And the second aspect of that decision is our commitment to build all electric machines.
Gas-powered landscaping equipment
is often uniquely polluting even for internal combustion engines.
And so bringing a valid replacement to the market to allow these kind of operations
to be done with electric machines instead of internal combustion engines is
a real positive contribution.
But in my mind, it also sounds very daunting to start such an endeavor,
because you deal with a lot of moving parts,
you deal with potentially safety-critical systems, you want to make sure not
to harm anyone, nor the environment. How do you navigate that space?
Yeah, the design of the machine is a big part of it.
I mentioned earlier that it's a fully capable manual machine.
It has manual controls. You can jump on board and start mowing the grass yourself.
And that was a really clever strategy to allow us to approach the autonomy problem incrementally.
We could, in our early stages, as our autonomy stack was maturing,
essentially see it as a mixed autonomy problem where the robot would handle
the big, bulky, easy parts of the job, just the center of the field where you
just have to go back and forth a bunch of times.
And the parts of the mowing job that are more difficult, the edges,
the obstacles, crinkly bits around the corners, we could say,
all right, well, this is a place where humans do a better job.
And so humans can take over and mow those areas.
And that gives us kind of a continuous space to refine our autonomy.
We start out saying, okay, we take the easy middle. And as we got better every
year, we start saying, okay, actually, we can take over more and more of these
tricky bits around the edges.
The company is based in Colorado, which means there's a lot of skiers on the team.
And so we have a skiing-based metaphor for difficulty.
We can look at a field and say, oh, this is a green circle, or this is a blue
square, or this is a black diamond.
So we we talk about our
our progress on autonomy and saying okay this year we moved from doing 80 of
the the blue squares to 95 of the blue squares that's that's a big step forward
in our autonomy and so we're chipping away at the gradient of of autonomous
difficulty one step at a time would.
You describe scythe as a software company a hardware company a robotics company
or something else entirely.
I think it is a robotics company in
that the hardware and the software teams both
have an essential contribution and they have to pull together the
the designs are are closely coupled the
software we build relies on the information that
we gather from the sensors that the hardware team chose
to to install on the machine and so
when the hardware team designs the next
generation of a robot the software team gets consulted to say okay well what
what areas are are most difficult for this generation how can the next generation
make the problem different the software problem different so that it is more
amenable to to be solved the.
The nature of design cycles does give it this back and forth character.
Hardware designs on annual cycle, if you're lucky, if you're lucky,
you can manage a one-year turnaround on your hardware designs.
We work very, very hard to essentially get that one-year turnaround on our hardware designs.
Whereas software is so much more protean.
It moves so much faster. We can release software on timescales of weeks as opposed to years.
And so that makes the software design more adaptable and more flexible and allows
us to be closer to an agile kind of methodology for our work than the hardware team has to.
The hardware team must follow a more solid and heavyweight design process because
the nature of their design commitments are so much higher impact.
Yeah. And you said it yourself, that hardware is in the field,
like literally in the field for a very long time, if you allow me the pun.
That means you need to select that hardware really carefully.
What's inside of these machines? What keeps them running?
Speaking of the CPU, the sensors, everything that makes it tick.
Most of our computation happens on embedded NVIDIA module, a Jetson module,
which is a ARM CPU with a GPU.
They share the same memory space, which is a very convenient architectural feature
for handing data over between the GPU and the CPU.
The rest of the embedded system are in-house designed PCBs, so different PCBs
to control the motors, to control the various sensors and other peripherals.
And each one has a microcontroller of some sort to govern a small selection of the hardware.
The the battery is
probably one of our definitely our heaviest component and and one of our one
of our most expensive and carefully designed components because the choosing
the battery defines the the runtime that you get in the field and that's extremely
important to professional landscapers they want to be able to use this machine
for many many hours during the day,
the the motors the the chassis the the
motors are are are sourced components that uh
that that we we choose from carefully selected vendors the
chassis and all of the the the physical instantiation of the machine that is
all scythe designed and small run manufactured steel components that are that
are produced on the on the production runs that we do wow.
That's a lot of moving parts Yeah. About the Jetson component,
is that a thing that even has Rust support, or would that be driven by,
say, a C or C++ component, or is there some other way to drive it?
Definitely the ecosystem's center of gravity is definitely C++.
By default, you expect that an embedded component was, but the vendor provides...
All of the vendor-supported files, all of the board support packages,
all of the drivers are in C++ by default.
And these are the boundaries where we have to do C++ to Rust interop.
We basically expect that there's going to be a C++ driver kind of making the
last connection to our peripherals or the video system or any other component on the board.
And that the data will cross the border over from C++ into Rust where we do our business logic.
Yeah, but to take a step back, just for context, did you start with a hybrid
C++ and Rust codebase or did you start with, say, an existing C++ codebase and gradually oxidized it?
Yes, it was a sudden and sharp decision.
So it's a fun story. I think it actually is a story that really illustrates
how your architecture choices are driven by what's most important to you at the time.
So in 2018, when the company came together as a small number of engineers in
a garage, the priority was to get
something working to show to potential investors as quickly as possible.
And the robotics C++ ecosystem is very deep.
And so the right choice in early 2018 was to put together a demo-worthy prototype
in all C++, reusing as much open source code as possible,
going quick, dirty, and scrappy, and making a machine that moved.
The company got its first seed funding in late 2018, and that was the moment
when their design priorities changed, and with it, their software ecosystem changed all at once.
They essentially threw everything out, everything that they had built,
just in the garbage, blank page, starting from scratch.
And their most important decision was to build something that would last,
that would serve the company throughout its entire lifetime.
And that was the point where they said that Rust was the best choice to base
as much of the robot logic as possible.
That Rust provided the kind of code quality that they were looking for,
and along with the kind of performance and efficiency that they needed.
And even at that point, so late 2018, early 2019, I'm saying they,
of course, because I wasn't with the company at the time. These are the founders
and the first engineers who were building something from scratch.
They saw that the state of Rust to C++ interoperability was sufficiently mature and reliable,
that they were confident that they could use whatever C++ component they needed
on the periphery of the system,
literally for peripherals, or to engage with the rest of the robot operating
system, the Rust ecosystem.
And that they were confident that they could make any Rust code they needed
work inside that ecosystem.
I find that so cool and I find it so inspiring because as soon as the developers got funding,
they moved over or they looked for greener pastures on the Rust side,
maybe, to build the thing right.
But immediately someone might say, well, couldn't you do the same thing with C++?
Because there are also long-term projects written in C++ and we use them every
day. couldn't you just keep on building on top of c++ and not throw the code
away that would have saved you time at least initially.
Absolutely there are there
are many successful robotics companies out there that are
almost exclusively c++ i think that many of the the the those first engineers
many of those first engineers had spent a lot of time writing c++ and had spent
a lot of time getting burned by the same problems over and over.
The same relatively simple mistakes that turn into an incredibly difficult to
diagnose bug that takes a long,
long time to root cause and find the original cause.
And everyone who was in those decisions, All of the early engineers,
all of the founders talk about a desire for code quality,
a desire to be able to write confidently, knowing that what they were writing
would not burn them in one of these almost stereotypical ways,
one of these classic mistakes which leads to trouble down the road.
Okay that means they were veterans c++ developers who had a background in the
language they knew what they were doing they were able to cobble together a
prototype to convince venture capitalists to invest so that's a really positive
sign and yet those people even with their c++ experience,
preferred to write newer components in rust.
Absolutely and they would each of them would would would defend that decision
passionately that and by staying with the company and by it by continuing to
develop in this uh in in in this mixed style of of rust inside a a world of c plus they basically uh.
Voted with their feet for the next year, for all the years they stayed with
the company and continued to develop in that way.
From my perspective, I joined the company two and a half years ago.
And I joined not knowing Rust.
I joined thinking that one day I would like to learn Rust.
And so I was open to the idea. And when I was hired by Scythe,
they essentially said, welcome to Scythe.
We understand you don't know Rust. Very few people who join our company do.
Here's your desk. Here's a handful of web links for resources where you can learn.
We look forward to your first merge request. And that's the pattern that most
software engineers who join Scythe follow.
Almost none of our software engineers know about what they join.
They almost all are robotics domain experts with experience in the robotics
industry, typically in C++, although sometimes in other languages.
And they join us with a willingness to learn. And there's an almost stereotypical
pattern of joining the company, banging your head against the borrow checker,
learning how to get past that,
writing your first MR, your first merge request, and then going through the
same sort of rust idiom conversations that everyone needs to go through when
they're getting started.
It's almost a predictable pattern of what your first MR will look like and the
first style conversations we'll have.
And then after a few iterations of feedback, people have made the transition.
They're Rust developers now and they don't look back.
I can relate to that. What were your main programming languages before you joined Scythe?
I had done C and C++, particularly in my distant past in grad school.
But a slightly idiosyncratic part of my journey is that my most recent job had
been 10 years at a company that used Java for robotics.
And so I was already coming from a slightly non-standard language.
And that maybe contributed to my willingness to pull up stakes and head it over to Rust instead.
And what were some of those stylistic, eureka moments that you had, given your background?
One of the design patterns that happens in Java a lot, and it happens in other
languages too, but Java particularly,
is using an immutable data structure
in order to signal that there is some need to keep data coherent.
So a structure has a collection of fields, and it's important for those fields
to stay consistent with each other.
Messing with just one of them is likely to cause some sort of inconsistency or bug.
And so in Java, you declare everything final, you don't give setters,
and that's a clear signal that this data should be treated as a package.
Unfortunately, when you do need to make changes, you often are making a lot
of copies of this same data just so that you're allowed to make the legitimate
changes you need to make through construction of a new copy.
But I came to Scythe sort of with this pattern well ingrained in me saying to
myself, when in doubt, a data structure should be immutable.
You should have a good reason to be able to mess with it.
And that was one of the early features of Rust that drew me in,
because clearly these ideas of when you're allowed to change data versus when
you have a shared reference and you basically can't mutate it, it's,
That's the first thing that people notice when they start writing Rust.
And then I noticed, oh, well, this language, this expressivity of saying,
oh, I have a shared reference or an exclusive reference to this structure.
I'm allowed to mutate it. Now I'm not.
I'm allowed to take a shared reference and mutate this data without making a copy of it.
And when I lose scope, when my exclusive reference goes out of scope,
I don't have mutable access to this structure anymore.
Suddenly, it was like the next level of this mutability-immutability concept
that I've been working with.
You could efficiently make changes when you need to in a way that very clearly
signaled that you were the only one allowed to make these changes.
And then give that access back and be confident that the data wouldn't change
when you weren't looking at it.
Okay. What I hear from you is that you value immutability now and explicitness.
The notion that if you return a thing that is immutable, it will stay immutable.
And also you do that explicitly. So you pass it back and then someone else can
work with the data, but not really mutate it.
I guess- This might be unique to robotics or not unique to robotics,
but might be particular to robotics, that quite often you are working with some
fact about the world, which is multidimensional.
All of these things are true right now at this instant.
And it's not really valid to talk about keeping 80% of these facts,
but not 100% of the facts.
I've made a package describing the world.
You need to make your decision based on this package of information, this struct.
Please do not take this struct apart and change any of the pieces because it's
only true when it's all together that that kind of multi-dimensionality truth
is i think a robotic specific feature.
It reminds me of a talk that I saw the other day by Jon Gjengset about a type-safe
spatial math library in Rust called Sguaba. We will link to it in the show notes.
It is for locations of objects in space.
And it feels like a lot of people use Rust for that specific purpose because
of its safety guarantees, because of its expressive type system.
It feels like you're kind of agreeing with this and you're also working towards that.
Absolutely. And I think when you say the expressive type system,
that is the door that opened up to the Rust way of seeing the world that I didn't
know about when I joined Scythe.
As a complete Rust novice, I did not see the value of an expressive type system.
Now, I'm not a Java developer, but to me it feels like Java also has a very
expressive type system.
Couldn't you encode the same invariance with Java?
It's absolutely true that Java has an expressive type system that can be used
in a lot of different ways.
I think Rust's decision to essentially not do inheritance or not make inheritance easy.
Leads to the Rust-type system being used in different ways to much greater effect.
The Java capability of classes inheriting from each other leads to a lot of
elaboration and customization of a class.
Oh, I want this, but I want it to act a little bit differently,
So I'm going to inherit from it and make a few changes.
For me, the first thing that I think about when I think about the Rust expressive
type system is the expressive enums and data carrying variants of enums.
And so Rust encourages you not to build deep,
deep hierarchies of classes inheriting from each other and making everything
more complicated as you go, until sometimes you don't quite understand what's
happening at the bottom of this long chain of inheritance.
Instead, Rust encourages you to make a single layer of hierarchy,
a single enum with all of the possibilities enumerated alongside each other in parallel.
And so you have a lot of expressivity and you can use it for a lot of things.
But it doesn't get so deep so as to be obscure.
You can usually trust that you only need to look one place in your Rust code
base to understand the range of options which is available to you.
Yeah. It almost feels like the Java ecosystem encourages inheritance.
Oh, absolutely.
And I think it's fair to
say that Java developed to take maximum advantage of inheritance before there
was a language design pushback to say maybe inheritance is making things more
complicated than it's worth.
And I think you can see that those design ideas coming out a little bit after
Java's maximum flourishing and growth.
When Java settled down and became a stable sort of industry background choice,
after that there kind of was a counterreaction to inheritance being used as
the design feature to solve all design problems.
You came to scythe with all of
that background and knowing that there might be some work on on the rust side
maybe with hardware that maybe is not familiar to you sensors and so on it sounds
like a very daunting challenge and you still signed with them,
Knowing all of that, what was your thought process back then?
Were you curious about what it was all about to work on that level?
Were you curious about the project? What was the main driver for you?
I was excited to learn. It was absolutely a daunting challenge for all the reasons you're describing.
I knew that I was signing up to learn some new skills.
I had been working with Java for 10 years. That's a point in your career where
you say, okay, maybe this is where I've settled. Maybe this is who I am and
this is my core competency.
It, it felt adventurous. It felt like a bit like, a bit like jumping off a cliff,
but I'm glad to say that I was jumping off the cliff into, into a wonderful, warm swimming pool.
And it was, and the water was fine. And I greatly enjoyed the transition.
Um, the, the.
It was a chance to learn new things and to pick up old problems with new tools.
The robotic algorithmic challenges that Scythe is facing is the same as the
algorithmic challenges faced by many mobile robotics applications. You need to plan paths.
You need to verify that your trajectories have no collisions.
You need to sequence a collection of work in a schedule that makes sense.
These are all problems that many robotics companies need to solve.
And they always have to solve them using custom approaches. It's rare to get
a clean solution that solves the entire class of path planning, for example.
Domain specificity is almost always a feature of the problems faced in robotics.
You really need to grapple with the very specific features of your application.
Oh, I have a lawnmower. That means I'm trying to cover all the grass.
I'm not interested in finding the shortest path from point A to point B.
I'm interested in finding the path from point A to point B that covers the lawn,
that gets all of the grass mowed.
These kind of domain-specific features mean that a lot of implementations are
always going to be in-house and whether you've chosen the right tool for the
job or not changes your quality of life as a developer.
I'll bring it back and I'll say that Rust has allowed us to solve many of these
classic problems in our own way using code that that we trust that we got right the first time mm-hmm.
That means you write it, you build it, and it runs more or less indefinitely without any problems?
The joke is, if it compiles, it works. And I don't believe that.
You can always make mistakes. But your mistakes will be like logic errors, not safety errors.
Testing is a huge investment for us.
Every feature we write, every piece of code we write, needs to be tested thoroughly
in order to be trusted to work in the autonomous space.
To move a 1,400-pound machine with spinning blades through space is a daunting challenge.
We take it very seriously to verify that the code we write does the job.
Every test is an investment. Every test has a cost.
It is incredibly powerful to say that an entire class of bugs has been excluded
through what the compiler does for you.
You don't have to worry about running it for hours and hours and hours just
to make sure you don't leak memory. The compiler did that.
We need to worry about whether the robot turns right instead of turning left.
That's the kind of mistake that the compiler won't catch for us.
But fundamentals of the stability of the the embedded compute those are largely
handled and we don't need to invest we don't need to focus our testing attention there.
That's incredible to see tests as an investment this is what it should be because
an investment has a potential payoff in the future it's.
Absolutely and there's a there's
a maturity process as your technology evolves, that in the early days,
Testing is easy and a little bit discouraging because your robot runs successfully
for five minutes and then encounters a fatal error.
And so your cycles can be very, very fast.
And relatively speaking, your testing investment is relatively light.
Five minutes of work gets you a new bug and off you go and you've got something to work on that day.
As your technology matures, as you dial your system in, suddenly you have to
test for long, long periods of time before you find something actionable.
And so you're putting in hours and hours and hours in the field to find one
piece of information which you can take back. And maybe it's no longer a fatal error.
Maybe we're not even talking about the robot stopping.
Maybe we're just talking about the robot doing something that we would prefer it not do.
And so the amount of time you need to put in to find, oh, statistically,
we're not quite hitting all of these cases exactly the way we would like.
We would rather lift this percentage from 40% up to 70%, please.
But it took us 200 hours of testing to gather that information.
You know you're doing your job right when your testing budget starts getting
very, very large. because you need so much time to gain statistical power on
the features you're trying to investigate.
Yeah. There's always this point in every project where you make a change and
it breaks and then you take a step back and you realize, no, in fact,
the system prevented a bug here and the system is more robust than I thought it was already.
It's always very enlightening.
And that is somewhere where Rust shines.
The general Rust pattern of non-exclusive coverage being a compile time error
saves so much time and effort.
I think this is another one of those patterns, which are particular to robotics,
is highly coupled state across components.
We try to take our complete problem, make this robot drive autonomously.
When we try to break it down, we try to decompose it into components in a stack.
Each component is responsible for an aspect of the decision making.
And ideally we would hope that those those
components were very cleanly separated and the information
they they share between them is extremely limited the reality is that you're
always fighting against the tendency to to couple your data across your components
severely that data is useful it's very important information to know what kind
of mission you're doing so a task planning level concern.
It's very useful to have that information down when you're deciding how the
robot should move through space when you're making a trajectory decision.
And so a large robotics code base tends to have long range coupling across components,
even though architecturally we're fighting against that as hard as we can.
But those couplings still exist.
It's incredibly valuable that when you make a change in a data structure in one part of the system,
the compiler catches the 10 errors you didn't think of because you had those
long range couplings and now your trajectory planner needs to be rewritten to
handle the change that you made to the task planner's data structure.
If your data structure has changed and you haven't covered all the cases,
it's an error, catches so many of these long range dependencies that otherwise
would be difficult, that would be runtime errors instead of compile time.
Now, wouldn't that be a property of a static type system, though?
Honestly, I think it's more about idiom and design philosophy.
So, for example, if you define a struct in Rust and you add a field,
If somebody tries to construct that struct and they haven't provided that last
new field because they didn't know about it, because this is one of those long
range dependencies we were talking about.
That is a compile time error unless you used the default annotation and unless
you said that it's possible for fields to be filled in by default.
So in Rust, it's possible to kind of remove this safety feature that if you
haven't specified every field of your struct, you can't build it.
But as the idiom if you say to yourself i would
rather not use default when i when i don't have to because
i kind of want to keep this feature of of like if
i if i've forgotten one of my fields i want to know about it
then uh then then you have that property it's
this this example of of
all fields being necessary is is just one of the of these kind of non-exhaustive
non-exhaustive coverage means error patterns another one is is match statements
or or pattern matching in general if your pattern matching isn't exhaustive
rust will tell you about it that isn't true in other languages.
Yeah yeah you could still use an underscore for a match
case. But
If i understand you correctly it's sort of an anti-pattern in a larger application
that favors correctness.
Absolutely and so it is
fair to say that that that Rust doesn't doesn't
do all of the work here for you that that like like you said there are there
are there are ways to to
have defaults or to to have the underscore match and they are it almost feels
like like it's it's context whether it's appropriate or inappropriate there's
plenty of times when it's fine to use underscore or to catch all the remaining
cases in the match. That's fine.
That becomes kind of company style or company culture almost where you are encouraged
or discouraged from using these kind of patterns.
Yeah. Do you have a coding guideline at Scythe?
And how does the review process look like? Do you look at such patterns,
tell people about it and tell them why it's a bad thing?
That would be an aspect of our growth that I'd love to see. We tend to hold
these kinds of styles as more of a culture than an explicit style guide.
Clippy gets us halfway there, by the way. Clippy has so many requirements,
so many good requirements, so many solid requirements that are enforced in our tool chain.
So it's not true that it's completely a free-for-all.
It is more about our code review practices that we say to ourselves,
okay, when we review code, we all know that we prefer to see things written this way.
We never use any of the functions which can plausibly panic.
Don't unwrap. Unwrap or else. Those sort of choices.
There are code bases, there are circumstances where panicking under unforeseen
conditions is entirely the appropriate thing to do.
We prefer not to do it because we would rather not panic our robot process as
it is trying to decide whether to go left or right.
Earlier, you said that the times to find a bug tend to become longer and longer,
the more robust the system becomes.
What are some of the problems that you find, if you allow me to put in the field with that robot?
It's probably, is it more of a business logic problem or a systems problem or
a domain problem? What is it that you find?
We are always striving to to improve
the reliability of the of the
machine expressed as the the
number of interventions that that you need to make when the the the robot is
in the field mowing and that's a function of the the difficulty of the the the
job that you're you're doing i mentioned before about the progression from up
the ski slope of of difficulty as we get better,
our customers trust us and put our robot in more and more difficult situations.
So that's the heart of our defect discovery through testing.
We say to ourselves, we know that our customers this year are putting our robot
through more difficult scenarios.
Let's say, for example, with respect to slopes. Sloped fields are more difficult
to do robot control in than nice flat green fields.
And so we'll say to ourselves, okay, we are going to ensure that our selection
of testing fields, the grass we mow in order to test, we're going to make sure
it includes a lot of slopes.
And so we're going to make our testing problem more difficult for ourselves.
Okay, now we measure the number of times when we.
When our robot trajectory goes too far away from
nominal as you're trying to to turn on a
slope and we say to ourselves well we want to bring that we we
want to bring these number of defects down they may not even be defects that
require any intervention and they certainly aren't aren't defects that lead
to any kind of of safety incident it's just we want the robot to to track more
solidly on on hills than it is already and we call it we call it a defect when
when it exceeds these bounds.
And okay, so we're now having one of these kind of trajectory defects every
five hours of operation.
Let's get that to one every 10 hours of operation.
That's the kind of defect that we're chasing when we're on a mature autonomous robotic system.
And it doesn't quite even fit into the rubric of code correctness anymore.
Now this is more like capabilities and tuning. So we say, okay, we are going to.
We are going to increase this parameter in order to emphasize tracking to the
desired trajectory more solidly.
Well, that has consequences. That leads to a probabilistic defect in other parts
of the system because the robot is now tracking more or is more solidly tracking
its desired trajectory.
That means in a different scenario than the robots on a slope,
when the robot is navigating around an unexpected obstacle, something different
happens with its performance. And now we're in a trade-off space.
Do we want to change our defects on slopes or do we want to change our defects
when navigating around obstacles?
And now it's a whole conversation about what's more important to us and what
the customer will value more in terms of a reliable machine.
I watched a talk from the Oxidize conference
from 2024 recently and in
there there was a company which described a navigation
or a planning system for autonomous robots
in a warehouse scenario where they
had multiple small robots and they would find a common trajectory for all of
them they used a system for deterministic testing they had an event log on each
machine and And then they could replay the exact scenario that happened,
which caused a deadlock.
Is that a thing that you can apply to your domain as well?
I love determinism so much. It is such a precious jewel to gain it when you have it.
And it is so hard to achieve.
That is an incredible investment which yields incredible rewards in robotics.
I'm sorry to say that most of our systems do not have that fundamental determinism.
First of all, the real world doesn't bring that kind of determinism.
The variations in timing of events in the real world,
the variations of how a wheeled robot on a slope behaves means that every real
world scenario is a unique and unreproducible event.
Even in purely simulation-based testing, determinism and reproducibility is
very, very hard to achieve.
We haven't achieved that in general on all of our simulators.
There are many smaller components, unit testing and integration testing of isolated
components, where we require and where we achieve that kind of determinism.
Because it is an enormous asset in root-causing your problems.
When you have a fully reproducible example that you are confident you can get back anytime you need.
Eventually, you can take that scenario and make it one of your integration tests
and come back to it to confirm that you've fixed the bug and it's stayed fixed
for as long as your system exists. So I respect it.
It's the kind of system feature which is very valuable. It takes a huge investment to get there.
Yeah and and even then you might not be 100% sure if it's worth it to pay to
pay that price because you might as well work on features at the same time or
maybe absolutely find another way how to test it in in the real world yeah.
The the reality of the of a startup with a short funding timeframe and a lean
team of engineers is that you really have to make careful trade-off choices
about which features you're going to invest in.
I've sung the praises of deterministic simulation, and I really believe that we may get there one day.
It's not the best thing for us to build right now.
Yeah. And even before you do that, you probably want to invest in a good failover
system, which you already have in place.
You briefly mentioned the disengagement scenario where for example you might see that there's some,
fundamental condition which is incorrect in the system so like an invariant
and then you you want someone to take over so there's probably a way for you
to say okay we can't handle the situation right now stop the engine and wait
for manual intervention this.
Is one of the the application specific advantages of the lawn mowing application in particular.
You don't have that freedom if you are building an autonomous car that's moving
down the freeway at 100 kilometers an hour.
You don't have the freedom to say, oh, this situation is off nominal.
I will simply disengage.
Your path to safety is complicated.
When you are mowing a lawn, your path to safety is short and assured. You just have to stop.
If your system is, if your conditions are sufficiently off nominal,
bring the drive motors to a stop, bring the blades to a stop and ask for help.
Everything will be okay.
Nobody is depending on you to get to the side of the road or to bring other
hazardous components to a safe state.
It's an application where safety is still critical, but there is a simple story
to how to make the system safe.
So, yeah, just to summarize, a deterministic system in simulation is incredibly
valuable, but it's not the most valuable thing for us to build right away.
The startup world means making hard implementation trade-offs as to what will
be the most valuable thing for your customers to build next.
And so our investments have gone to other places.
Which brings us to today. What is the current state of the code base?
How much of it is in Rust? How much of it is in C++?
Do you still use ROS and to what extent?
Yes. So the ROS is the middleware that is the backbone of the system.
ROS stands for robot operating system, But it's best understood as inter-process
communication, along with an ecosystem of tools for simulation,
for observability, for injection, and other capabilities.
So ROS moves the data around our system. Most of our perception stack is in C++.
The ability to work with GPUs and CUDA in Rust is maturing.
And there are exciting projects right now. But over the history of Scythe's
development, that has been similar where C++ has stayed very strong.
So most of our perception stack is in C++.
That information gets moved from the perception processes over to the autonomy
processes using ROS as the middleware.
Almost all of our autonomy code is in Rust. so that
is every decision we make once we have
assembled our perception of the world our our
task planning we decide what we're deciding we're doing next at
the high level our navigation are deciding
where we're going to move and how we're going to get there and our
trajectory and motor control so all the way down to deciding what
torques should be applied to to the motors is is
a decision being made in rust we favor an actor-based framework inside our Rust
system to decompose the problem down into components that exchange messages
as a way of prompting each other to make decisions or sharing information with each other.
And so I've mentioned before that most of our software engineers join our company not knowing Rust.
And they develop into application-level developers who write Rust to solve robot
decision-making problems.
There are also some engineers at our company who are deep Rust experts who showed
up with very deep knowledge.
And one of the parts of the system that they build,
is the framework that supports these robot logic components.
So the actor-based framework that I described.
And so the actor-based system,
its overall framework is written by our Rust systems experts and the actors themselves,
the components that make the decisions are built by our robotics domain experts
who are solid application-level Rust developers.
Then that actor framework is not open source. That's probably in-house.
It's an in-house actor framework. There are several open source actor frameworks
in the Rust ecosystem right now.
I think the choices that were available to us in 2019 when we were starting
down our Rust journey were such that an in-house actor framework was the right choice for us.
If we if scythe were being founded in a garage today in 2025 it's quite possible
that many of these exclusively in-house components would instead be the open
source rust crates because the ecosystem has has expanded and matured during
the during the last six years.
Yeah and if someone's listening
and wondering what a modern active
framework might look like in rust the one that i
like a lot is Ractor we will link to it in the show notes it's i guess it leans
into the Erlang model a bit and it has a lot of components in there that are
really helpful for a production actor system for example uh,
factory so being able to have multiple actors
in a queuing system and supervision
and things that you need once you run that thing at a larger scale but your
actor framework specifically is that synchronous does it mean you you mostly
write synchronous code where possible or do you also use tokio in combination with it.
It's an asynchronous actor system but most of the time can we write synchronous code?
So this is maybe one of those distinguishing characteristics between the deep
rust expert and the robotics expert who's writing in rust today,
is the comfort with asynchronous rust code.
At the top level, our actor framework uses tokio to marshal all the actors together
as asynchronous actors.
But any asynchronous code can call synchronous code inside it.
So the framework exposed to our robotics developers is handle message function
call, which looks like a synchronous function. It is a synchronous function.
It is being called from asynchronous code, but you don't need to worry about that.
This is most often the right division of responsibilities.
Asynchronous Rust code has some pitfalls. You need to be aware of cancel safety
and many related issues in order to get the job done right.
Many of the decision-making components in our collection of actors don't need
to leverage the features that asynchronous Rust provides.
Most of them can be thought of as a message handling system that gives quick
answers to each incoming message.
And you might as well write that as a piece of synchronous code and not concern
yourself with the possible pitfalls of asynchronous Rust.
I'm very glad that the folks on our team who built the actor framework are well-versed
in all the pitfalls of asynchronous code.
But it's a convenient decoupling of responsibilities to say that the asynchronous
code exists only at the actor framework.
And most individual actors are just synchronous.
Mm-hmm.
Gotta say, that was great foresight from the people that worked on the framework,
because you don't have to deal with the complexities that you mentioned in the async Rust ecosystem.
You mentioned cancellation specifically. Do you have an example for when that
becomes relevant in your domain?
For this i think i really have to just defer to
an excellent talk at RustConf in August
on cancel safety given by Rain where they
ran through so many examples of
how cancel safety is important and how it can
go wrong i guess the the easiest pitfall
that i can just cite off the top is in an
actor-based system where control and
and essential information is arriving in messages
probably the easiest way to mess up a cancel
safety issue is to cancel
and have the message disappear forever or
the message that you were handling disappear forever if that that message was
a critical piece of information which will only arrive once well then you probably
have caused something a serious problem by by never looking at that message again so So...
If your actors are receiving unique, irreplaceable, must-be-handled messages,
then you better get your cancel safety right when you dequeue those messages and
hand them off to your message handler. That's the easiest example I've got.
Yeah. And that helps with determinism, too.
Absolutely. Yes. For sure. So yes, to expand on that a little bit,
one of our most common testing strategies is to take an actor and feed it a
test benched set of messages in a particular order at particular times,
and then make the test dependent on whether the actor gives the right answers back.
So an actor can be fully deterministic.
Given the identical sequence of incoming messages, the actor will always give
you the correct output, or you strive for that.
It's only in the glorious complex composition of all of your actors against
a real-world system that is triggering events at uncertain times that you gain
the kind of indeterminism that makes it a big complicated problem.
We're getting close to the end and the final question in this podcast is always
your message to the rust community.
Absolutely stage is yours when i
when i think when i think about rust and when i
think about the open source community that that built
it and maintains it and and moves it forward today
it's my feeling is one of of of gratitude this is an incredible thing that a
large diverse community of people have built and rust as a language sits at
a very valuable point in the language design space,
really is something special in terms of the guarantees it gives,
the confidence you can have when building complicated systems with it,
knowing that the language has struck such a valuable trade-off between competing
concerns of safety and efficiency and expressiveness.
It's a technology i'm glad to use every day that i use it and i hope that its
development toward these goals continues as long as it can.
Couldn't have said it better. Andrew thank you so much for taking the time today
to do the interview thank.
You very much Matthias. It's been a pleasure.
Rust in Production is a podcast by corrode. It is hosted by me Matthias Endler
and produced by Simon Brüggen.
For show notes, transcripts, and to learn more about how we can help your company
make the most of Rust, visit corrode.dev.
Thanks for listening to Rust in Production.
Andrew
00:00:27
Matthias
00:01:53
Andrew
00:01:58
Matthias
00:02:20
Andrew
00:02:34
Matthias
00:02:56
Andrew
00:03:01
Matthias
00:03:57
Andrew
00:04:18
Matthias
00:06:01
Andrew
00:06:07
Matthias
00:07:57
Andrew
00:08:20
Matthias
00:09:54
Andrew
00:10:12
Matthias
00:11:04
Andrew
00:11:17
Matthias
00:13:40
Andrew
00:14:15
Matthias
00:15:24
Andrew
00:15:49
Matthias
00:17:50
Andrew
00:17:56
Matthias
00:18:22
Andrew
00:18:28
Matthias
00:20:45
Andrew
00:21:04
Matthias
00:21:50
Andrew
00:22:22
Matthias
00:22:45
Andrew
00:22:55
Matthias
00:24:49
Andrew
00:24:56
Matthias
00:25:49
Andrew
00:26:23
Matthias
00:28:52
Andrew
00:29:00
Matthias
00:30:22
Andrew
00:30:31
Matthias
00:32:00
Andrew
00:32:22
Matthias
00:34:25
Andrew
00:34:29
Matthias
00:36:03
Andrew
00:36:18
Matthias
00:36:59
Andrew
00:37:11
Matthias
00:38:22
Andrew
00:38:48
Matthias
00:41:46
Andrew
00:42:26
Matthias
00:44:10
Andrew
00:44:29
Matthias
00:45:03
Andrew
00:45:38
Matthias
00:47:02
Andrew
00:47:14
Matthias
00:50:01
Andrew
00:50:06
Matthias
00:50:41
Andrew
00:51:33
Matthias
00:53:42
Andrew
00:54:04
Matthias
00:55:12
Andrew
00:55:17
Matthias
00:56:10
Andrew
00:56:20
Matthias
00:57:31
Andrew
00:57:38
Matthias
00:57:39