Rust in Production

Matthias Endler

Volvo with Julius Gustavsson

Putting the good kind of Rust into cars

2025-01-23 68 min

Description & Show Notes

The car industry is not known for its rapid adoption of new technologies. Therefore, it's even more exciting to see a company like Volvo Cars embracing Rust for core components of their software stack.

We talked to Julius Gustavsson, System Architect at Volvo Cars, about the use of Rust for their Electronic Control Units (ECUs) in Volvo's EX90 and Polestar 3 models and how they are building a Rust ecosystem within the company.

About Volvo

Volvo Cars is a Swedish premium car manufacturer founded in 1927. The company is known for its focus on robustness, safety, and sustainability. Its headquarters are in Gothenburg, Sweden, and it has manufacturing plants in Sweden, Belgium, China, and the United States.

About Julius Gustavsson

Prior to Volvo Cars, Julis worked at Ericsson, among other companies. His background is in embedded systems and software development. His master's thesis was about System-on-Chip (SoC) design.

Links From The Episode (In Chronological Order)


Official Links

Transcript

Matthias
00:00:01
It's Rust in Production, a podcast about companies who use Rust to shape the future of infrastructure. I'm your host, Matthias Endler from corrode, and today we talk to Julius Gustavsson from Volvo about putting the good kind of Rust into cars. Julius, thanks for being a guest. Thanks for taking the time. Can you introduce yourself and Volvo, the company you work for?
Julius
00:00:28
I'm Julius Gustafsson, and I work for Volvo Cars, and I am the main architect and team lead for the LPA project at Volvo Cars. And the LPA, which stands for the low-power processor, is the first ECU in the automotive industry, at least as far as we know, that is fully written in Rust, and it's rolling off the production line as we speak.
Matthias
00:00:53
The first time i heard about this project was even a few years ago when we were at a conference i can't even remember which one but someone mentioned that volvo was working with rust and by rust we mean rust the programming language of course not rusty oxidation process but was it was way back in the day and i didn't even know what parts of it were public or if that was a separate vendor now take us back to maybe 2018 or so when when this project started what was the situation with rust back then what was the situation especially with embedded rust and how did you even think about using rust at volvo yeah.
Julius
00:01:45
Yeah so so i joined volvo at in 2017 and that's where about the same time as as this spa2 project started and and right, basically that when i was joining i was also discovering or i was i'd already discovered rust a couple years back but i was getting more and more certain that this was something that we needed to look into. And so we started out with a few products when we were in these very, very initial stages of that project. We were doing all kinds of proof of concepts. And the first thing I did there to test out Rust was to, since I came from an Android background, so I knew the ins and outs of the Android system. So I was able to build an Android HAL, an automotive haul. In android was able to get that to build and link and do all kinds of shenanigans to get that to work because the build system didn't really support it but i was able to work around that but and and so so this android how basically the the point of it was to use the android ui to control the the ac or the fans in the car and yeah and we had already built this small system that was running on a separate like a raspberry pi that would actually send the can signals to the fan so we needed to communicate with that to send commands that system basically and and so i i used or i created a how that actually communicated with that over grpc if i remember correctly using futures and and. Back then there was no async await so this was like yeah uh futures zero zero point oneer yeah and and it was it was such an amazing experience because yeah you know you had to do all these things to get everything to work because there was so many moving parts to get like the things to build in a build system and and yeah getting the whole like you know from the button to the actual methods being called in the hall and then having that send out something over grpc over wi-fi to to that other chip and and basically when i had it all wired up and it started to build it took me a while the fans just turned on on the first try so it's just. Super amazing i've i had never experienced that before and and like yeah so that was like a first proof point that this is definitely something how.
Matthias
00:04:30
Much of that was because of rust and how much of it was because of your experience with embedded systems.
Julius
00:04:36
I would say of course my experience didn't hurt. But I know for a fact that I've been doing C and C++ for mostly C though, but a fair bit of C++ as well for the better part of 20 years. And every single time there is always something or at least something creeps up, you know, especially when you're least expecting it. There is something that you didn't think about or someone else didn't think about or something you thought of that wasn't written down and then someone else broke that assumption. And, you know, there's always something that makes these kind of, especially when it's so many moving parts in this little proof of concept that I was doing there, that, yeah, that just wouldn't have happened, or at least not for me, that's for sure. So yeah rust definitely did the did the heavy lifting there or at least made sure that i had because rust tends to make you you know think through the whole design up front basically or at least you know a much larger part so so yeah so so there are these fine you know loose ends that you also need to tie up before it actually builds and runs and and i think that those are the things that make the difference why.
Matthias
00:06:07
Did you decide to build that with asyncrosed given that it was still in an alpha version couldn't you have done something with syncrosed.
Julius
00:06:17
Probably but the the the nature of the since we were doing like g rpc communication over wi-fi it was it was very asynchronous in its nature so i wanted to to try it out and see how it how it would work, so yeah that was that was the idea a.
Matthias
00:06:36
Lot of back-end developers hearing that might be a bit scared by the way async rust worked back in the day where even in back in the day i think you needed to build your own futures was that still the case back then.
Julius
00:06:51
Yeah yeah yeah that was definitely the case and.
Matthias
00:06:55
You still saw the potential there even though it was a bit let's say half baked or it wasn't completely fleshed out yet.
Julius
00:07:04
Yeah absolutely well needless to say then maybe we'll come to that later when how we came to the LPA project but where we're not using it at least not yet, But yeah, I had done a fair bit of, and also a lot of embedded systems are, you know, callback driven in their nature because you are getting all kinds of events from hardware peripherals, from, you know, other things than your main thread constantly. And also you're, you know, when you're achieving something, you usually need to do it in steps. So you do something, register a callback, wait for that callback to be called, then you continue. And and and this and even in in that stage in that stage of futures at the time it was it was still like removing a lot of boilerplate already then so that's that's why i thought it was intriguing given.
Matthias
00:07:59
Your background as a c developer and probably also as someone who has a fair share of experience with embedded systems how would you have done that with the technologies that you used before c how would that look like.
Julius
00:08:13
It's a good it's a good question i i guess i would i would most likely use some sort of async framework in in c++ i guess since the yeah we we had gb grpc sorry and and so so that was kind of yeah and and grpc protobuf all that stuff was already fairly well supported in C++, so I guess I would go for that. And I don't remember if those async frameworks at that time in C++ were, how far they'd come. They probably had similar sort of features. So the problem there is that you usually don't, when the closure that you register actually gets called, you're basically on your own because there are all sorts of assumptions that need to be upheld for that to actually work properly.
Matthias
00:09:17
For instance, which assumptions?
Julius
00:09:20
For example, memory or variables or pointers and things that might not exist when it comes back. So yeah, dangling pointers is a common issue.
Matthias
00:09:33
Even for someone like you who has quite a ton of experience in that area?
Julius
00:09:37
I would probably trip on those a couple of times before I get it working. But because it's it's so easy to yeah it's it's super easy to to misstep when it comes to those things.
Matthias
00:09:50
So for some context we are in 2018 ish and you built your first hell your first hardware abstraction layer for controlling the fans from android and you saw it working on first try that must have made you pretty excited about rust and its potential to go from idea to first prototype in a reliable manner did you show that prototype to other colleagues did you talk about rust back in the day or was it more of a hobby and side project.
Julius
00:10:21
No i was i was quite probably quite obnoxious i'm pretty sure there were some colleagues that were getting quite fed up because i was i was talking about it a lot and and i did i remember i did a talk back in in 2017 where or No, it must have been early 2018, where I tried to do a basic introduction, but also to point out all the upsides. And then when this SPA2 project started to crystallize more and we started to productify everything that we had in our plants, I started looking for places where, or see if this was actually viable as a language to use. And. Unfortunately, none of the platforms, so if we go back a little, the SPA2 project, it's the electrical system in the Volvo cars that are coming to market now. And the main big feature of that system is the core computer, which you can probably Google and read about. And this core computer is essentially a centralized box with multiple different processors, all running different functional domains, doing different things. Like you have security, you have high integrity for like braking and steering and these safety critical things. And then you have a high-performance compute for these more big number-crunching things and ADAS functionality, autonomous driving, or that kind of use cases. And all these were all different platforms running different operating systems or hardware, and none of them actually had good Rust support. Except for the security gateway. But we, yeah, we were, or yeah, that's a long story why we didn't start off there. But yeah, so we had the, yeah, so we had a QNX system and there was no QNX support. We have had Tricore based units, which Tricore is a common automotive based microcontroller, specifically designed for safety-critical type of use cases. And none of these had Rust support at the time.
Matthias
00:13:02
I'm trying to put myself into the position of a team member or a colleague of yours in 2018. So you're telling me about Rust. You're enthusiastic about it. I see the prototype. I see that it's working. But I don't really know. I don't really have any Rust experience. And I don't know how long it took to build that prototype. So I look at all of the tooling that we built in C and C++ and the ecosystem that it requires, and then I ask you about QNX support or tricore support for Infineon CPUs, and you say, well, it doesn't exist or it's a work in progress. Suffice to say, I'd be a little skeptical.
Julius
00:13:44
True, and Volvo is also an automotive or a car maker, and automotive people tend to be quite conservative when it comes to you know new things we like to use well-proven technologies so yeah definitely but I always saw that you know this was actually something different something that, could actually provide real value.
Matthias
00:14:09
Thinking about the situation in 2018 and 19 when things were not completely fleshed out was everyone on board already or was there some political infighting as well about whether to go forward with rust or maybe be a bit more conservative here because you also mentioned that the car industry is conservative yeah yeah.
Julius
00:14:30
So i wouldn't say political infighting But the one question that always came up, and that was, is it safety qualified? So is it qualified and certified to use in a safety critical setting? And that was basically a no. I mean, nothing had been done on that front at that time. So that was kind of a non-starter because most of these functional domains that I mentioned earlier these the cpus they are running they have a safety rating so to say these acil levels that are that the automotive standards define why.
Matthias
00:15:12
Are they necessary for toolchain to fulfill.
Julius
00:15:17
The safety standards that the automotive industry uses basically they they talk about how do you provide the evidence that the product that you've developed actually does what you think it's doing and that there are no hidden errors or dangers that you could have foreseen and mitigated. That's basically the reason why everything needs to be, vetted so to say both the tooling but also the code and everything that you that you use.
Matthias
00:15:49
Did you back in the day already know about the efforts of building a compiler safety critical rust compiler that that will be allowed to to be used in such environments like things that ferris systems built for example no.
Julius
00:16:07
So and and first system they actually started maybe six months later or something, we met them at a conference in, I think it was OxidizeConf, and they soon after announced the ferrocene, or it was called Sealed Rust at that time. But yeah, so that started soon after. But I always felt that that wouldn't be an issue over time, because obviously we are doing safety-critical systems in C and C++, and we know all the, you know and it's riddled with things that that you can where you can shoot yourself in the foot and you need all kinds of tooling to basically ensure that you're not doing the things that you're not not allowed to do while rust is preventing most of those things up front you know just by design you can't do most of those things so so right from the so i felt right from the get-go that, and it was quite ironic that here all of a sudden you have a language that. Basically, you know, does all that stuff for you, but it doesn't have the, all the paperwork or all the necessary evidence in place. So therefore you can't use it even if it's, or at least according to me, you know, much better. But, but, but yeah, at the same time, I realized that it was probably a non-starter for anything where safety certification was needed. And then we come to basically the fifth processor on this chip, and that is the small low-power microcontroller or processor, which is called the LPA. And that one was bare metal Cortex-M4-based chip that no one was essentially working on at that time because they had some basic software running on it just to power up the thing, but otherwise it was being neglected because everyone was focusing on the other SOC. And then it turned out that Cortex-M4 was actually the best supported embedded target at that time for Rust. And the 2018 edition of Rust was actually the first edition that made building things for embedded possible and stable so so yeah so the the stars aligned a bit there so we me and a colleague who who was and is as enthusiastic about rust we teamed up and and. And got help from some of our managers who had seen my talk, for example, and was also convinced that we needed to really need to look into this for the future because we needed, basically, we need better tools to get our things done more effectively and more with better quality. So we've convinced them that hey we have this chip here that no one is working on yet, it's perfect testbed for as a rust project it doesn't have a safety. Level on it because it's actually constructed in a way that so we power on the we power on the board or this box when certain events happen but we can't power them off by ourselves each each cpu has to power itself off, and and and since a car in an off state is not considered is it where or it's considered safe when it's when it's power off so so going from off to on is it's not a safety critical thing but But of course, going from on to off, you know, when you're driving, that would be completely, you know, high safety. No, exactly. So the hardware is constructed in a way that makes that impossible. So the shutdown is done in a safety certified way where we are not involved. And yeah, so that was like the third or three things that added up. It was it was pretty well supported by rust already it wasn't safety critical so we could use it as a testbed and no one else was actually paying attention to it so we were able to form a team and and get to work it.
Matthias
00:20:49
Sounds like a good strategy for testing out new things in general and maybe specifically testing out rust in embedded environments or maybe in larger organizations.
Julius
00:21:00
Yeah absolutely and yeah if you have yeah if you have these conditions then it's definitely a, something i would recommend one thing we also did from the start was to get help external help so we, we got a few people from from grepit in in here in sweden they're placed up in lulio north sweden, and many of them are are deeply involved in in the arctic scheduling framework, yeah it's not really a real-time os it's but but but yeah so so and and needless to say we also use arctic in in in lpa yeah so so we brought on these experts they were the foremost experts in sweden at the time and they they also they helped us get started and and helped us with a lot of the driver development and and things like that so so i would say that is also a big piece of the puzzle to get good people to help you get started either either from within the company or externally depending on what options do you have.
Matthias
00:22:16
Who made the decision to hire externals was it you because you maybe wanted to get an extra pair of hands was it the management was it a requirement from volvo itself.
Julius
00:22:28
It was it was a team together with management we we we also needed to we needed to start fairly quickly and we needed to like yeah size up the team fairly fast with some knowledgeable people and and and also, At the time, we weren't really sure how much effort the driver situation would be. So we felt that we needed to have someone to hold their hand there.
Matthias
00:22:59
Did you have a deadline on, for example, you decided that you would be done by that time or you would have something to show by that time? And if not, then the project would be canceled?
Julius
00:23:12
Uh well yeah so they i don't think there was there was never a risk of it being cancelled as such but but yeah we we wanted to we set ourselves a deadline that yeah so so like i said earlier we already had some c code running there that that was basically produced by by by this or not the stock vendor but the the company that made the actual hardware and and And that was doing most of the basic things. And we felt that it would be fairly straightforward to revert back to that and build that out in case. Yeah. So we had always that as a backup. And in fact, we started off working from that base and trying to build Rust on top of that. But we quite quickly figured that having a 100% Rust-based solution would actually be a much... A better choice why yeah because you you you don't have to worry about all these ffi boundaries between c and and and also when you're building it on on on bare metal embedded it kind of depends on what compiler you're using what linkers you're using how the because the c libraries they expect. Certain initialization to be done in a certain way and so there may be incompatibilities there that, yeah, so even though Rust and C can interoperate seamlessly, there are, there can be some issues when you're doing it on a bare metal target like this and since we were taking us already working C project and then adding Rust on top and that was yeah, we, It was bringing us more headache than just rewriting that little piece of, or those few lines of code that...
Matthias
00:25:12
How long did it take you to rewrite these pieces of code?
Julius
00:25:16
So there was mostly Grepit that did the initial bring up, and it took them about a couple of months. They did it over the summer, basically. So yeah, when we came back after vacation, They already had a demo of an application that did basically exactly the same as the one that we had. And when we had that working, we saw that, okay, now there's no point to... Carry on with the other one but.
Matthias
00:25:45
Even then going forward did you compare your maybe output with the output of the c-based version in order to make sure that it was compatible or was it mostly you know.
Julius
00:25:58
It's up now.
Matthias
00:25:59
We can compare the gRPC outputs essentially.
Julius
00:26:02
Yeah so now now we're now we're not doing gRPC anymore anymore so so it's a different kind of communication but but yeah we We already had started working on a CI pipeline with tests and things like that. So yeah, that was like the first goal to ensure that the same test suite would pass with the new Rust version.
Matthias
00:26:26
Interesting. So you don't use gRPC anymore. What were some of the issues that you faced?
Julius
00:26:31
Well, basically, so that prototype that I did back then, that was basically the Android infotainment system, communicating with a prototype of this core system that we are now building. So there was that communication. But now we are inside the core system and communicating between the CPUs. That's on a much lower level. So yeah, we're not doing gRPC on this node.
Matthias
00:27:05
I can understand. Is that a public protocol, an open protocol, or is that a Volvo-specific protocol?
Julius
00:27:12
Yeah, so the one we're using now is actually Volvo-specific. The LPA is connected via UARTs to all the CPUs. That's the only way to communicate with it and, We searched high and low for a protocol that would give us, because in a car you need to deal with a lot of EMC, electromagnetic interference. And so it's quite common that you get corrupt messages and things like that. So we need some sort of protocol that is similar to TCP, but much lightweight, of course. It's just a point-to-point. But something where you can detect that messages are missing or corrupt or malformed in various ways.
Matthias
00:28:07
Right. And my assumption would be that you had to build out a Rust library for reading and writing that format.
Julius
00:28:15
Yes. Yes.
Matthias
00:28:16
And was it hard for you? Did you use any existing libraries to build that parser or to serialize the messages over the bus?
Julius
00:28:25
So we actually use both, or this protocol allows us to send different types of payloads over the, so depending on if the payload is, we'll probably get to that later, but we also need to comply with the automotive diagnostic standards called UDS, universal diagnostic. And so depending on if the message is a UDS payload then it has certain format that is standardized by that standard or if it's like in-band messages between the processors then we're actually using CBOR as a messaging format and at that time there wasn't any CBOR library for no STD. So we wrote that. And then, of course, we used 30 as a, which is a super awesome yeah and.
Matthias
00:29:28
What about the automotive diagnostic standard i guess you called it.
Julius
00:29:33
Yeah what.
Matthias
00:29:33
About that protocol was there an existing implementation or was it easy to write.
Julius
00:29:40
It's it's fairly straightforward it's basically made for so up on up until now the these easy use in in cars these electronic control units which are the basically the building blocks of of an electrical system in a car and those those have existed for a long time they're always increasing in numbers now we're up to almost 200 or something and in the current current, generations and and and these are traditionally very small microcontrollers with very little ram flash things like that so so so that protocol is is fairly compact and and simple but but yeah we we needed to write a parser from scratch for for that that's super amazing.
Matthias
00:30:30
Because there are a couple other vendors that use rust in production now for example renault which is also a prominent car manufacturer i wonder if they use similar protocols or if they have a different setup i do know that everyone sort of used or uses CAN bus.
Julius
00:30:49
But then the wire protocols might be different right you.
Matthias
00:30:53
Have different requirements.
Julius
00:30:54
Yeah is.
Matthias
00:30:56
There any collaboration going on between those car vendors on.
Julius
00:31:00
A message.
Matthias
00:31:01
Bus level on a protocol level.
Julius
00:31:07
They're not directly at the moment but things are actually they are I guess they are improving now with all these different SDV or software-defined vehicle organizations that are sprouting all over the place. You have Eclipse SDV and you have Covisa and you have others. Through those, I hope we can or we will be able to collaborate more. But but yeah so no during this project we we didn't have a chance to do too much of that we were mostly focused on getting our product out the door are.
Matthias
00:31:48
You planning to open source some of that work and are people ever gonna be able to take a look at this and maybe use it in some way.
Julius
00:31:59
I i hope so so we we already so all the drivers that we developed have have been open sourced or or have been upstreamed to the Atzamd project. And then we commissioned Grepit to write this MCAN library, which is hardware abstraction for this MCAN peripheral that is quite common, so it should be available to... Or, yeah, if you have a SOC that has that peripheral, then you can use that crate out of the box. Then we have a bunch of other things that we've developed that might make sense to open source, but we need to look at that more going forward. But for example, the UDS stack might be something, or this diagnostic stack. We have this seaboard library, for example. Now actually we have, there are a few in the community already. But yeah, so we have some compression, error correction. Yeah a bunch of things things like for example. Trace so being able to tag the tests with requirements so that you can you know produce reports where you can connect because that's a big part of providing evidence that your code actually works. Or that the code actually does what it's supposed to do rather is that you have you know these higher level requirements and then they are broken down into you know individual component requirements and then they are then there are tests that ensure that those requirements are actually being met and and then you need to show basically in a in a traceability matrix that you know all the requirements are are actually being tested and functioning according to you know and And so we've developed a lot of tooling around that to show our coverage of requirements and things like that. Some of those might be something that might be good to open source. Others are fairly Volvo-specific, so they might not be as... It doesn't make as much sense, but we'll see.
Matthias
00:34:29
It feels like to be able to pull this off you first had to invent or create the universe because there were so many small little bits and pieces that someone would have to write and you needed to build this entire tool chain so moving forward probably as a next project this will be way easier because you you built out a lot of the tooling but i wondered if the rust ecosystem also helped you so what are some of the things that you used from day one or maybe you even use today a couple of nice projects that you want to mention here um yeah.
Julius
00:35:08
So i mentioned 30 already probe rs is definitely something.
Matthias
00:35:15
Shout out to Noah, who is one of the maintainers of Probe.
Julius
00:35:19
Yeah, they've done an amazing job. We actually use, so what we do with Probe RS is we created our own wrapper around it. So it's basically a command line tool that is tailored for this particular chip. And it does flashing and debugging and setting up all the peripherals in a correct way and disabling write locks on certain things so that you can flash or erase different things in different orders and stuff like that. So we have our own called LPA probe, which is then basically a wrapper around ProbeRess. But we also have a Python binding package around ProbeRess so that we can use that in our system tests which are Python-based. And then you can essentially use it as any other Python library within our system tests.
Matthias
00:36:11
I do wonder, are there any things in Probe RS that are missing right now and things in the embedded or automotive ecosystem that you would like to have reflected in Rust in the Rust ecosystem?
Julius
00:36:25
I don't know about Probe RS for the use cases that we have had. There hasn't been. I can't recall anything specific that we haven't been able to fix together with them. But otherwise sure there are you know we want to make Rust a fully first class alternative to C and C++ when it comes to safety critical and in order to get there we need you know the toolchain needs to be certified which is now actually available but we also need there are still libraries that needs certification and we need tooling around it for example mcdc coverage and and other tooling for for better you know traceability within the to yeah trace the requirements in a even better way to get better coverage metrics over you know for the code reports clippy for For example, it would be nice if Clippy would produce reports, even if you can set it to, warnings as errors, it would still be nice to give an overview of what warnings you actually have or what remarks it has in the project.
Matthias
00:37:55
Do you have a requirement to create a software bill of materials.
Julius
00:37:58
Yes i was gonna get to that as well we we need to we do that as part of and that's part of our our build infrastructure that we've built out cargo by default it it just but yeah it helps you to get the total list of everything that you're using in in the project but we want to know what what are we actually shipping so how much of that is actually tooling that is that is never leaving leaving the company and what is actually the binary that is getting flashed in in the in the car so we had to do yeah add some some filtering on top of that to to produce a an s-bomb that that is yeah more accurate in terms of what is actually being deployed and.
Matthias
00:38:49
Where do these files end up you certainly use it for internal documentation, but do you also have to hand them out to some other authority which approves certain software?
Julius
00:39:05
I'm not sure, but... They need to be available at request. And we have... There's open source portal at Volvo Cars that you can... Actually, if you own a car, you're always able to request, the license notices basically for for everything and and yeah so so those files are used for that but also they're used for internal monitoring for example for cargo audit or or other forms of, auditing and monitoring to ensure that if there's a vulnerability somewhere that we know about it and are able to react to it and things like that.
Matthias
00:39:52
I know from my car that there is a list of tools or a list of software that got used especially i would say maybe larger applications or larger, libraries for example curl is a very popular thing and we had done your stand back in the podcast as well but i wonder how far it goes down because you have dependencies and sub dependencies and dependencies of these dependencies is that all listed somewhere can i access that as a user can i, maybe get the entire cargo tomo of the project or would that be too much so.
Julius
00:40:33
The entire cargo toml would contain much more than we're actually shipping. So we're not providing that as part of the car. There it's only the software that is actually running on the car.
Matthias
00:40:49
For example, Sardis, would it show up somewhere?
Julius
00:40:52
It would show up, definitely. So yeah, all dependencies that are, getting compiled and linked into the binary are would show up.
Matthias
00:41:05
Let's say I started Volvo and I'm in your team and I want to start working on this project in Rust. What would my day-to-day look like? I plug myself into an embedded device? Certainly not. There's probably some other device somewhere I need authentication for. And then we have some CICD process. We have ProBriest somewhere in the tool chain. We have an IDE. And then we have a build process in the end and eventually there will be a binary and a software bill of materials and probably some sort of integration of all the components later on. So walk us through all the steps.
Julius
00:41:46
Absolutely. But first we must talk about the hardware. So this core computer that I described in the beginning, where we're actually running inside the car, that one so when we're running there it's it's quite hard to test the lpa and on its own because it's it's essentially you know a small island surrounded by a bunch of other processors so and and there are not that many connections to the outside world towards that one so so yeah if you if you want to. Test the lpa individually you would have to essentially deploy some sort of distributed test program on the other cpus and then have them have those test programs like probing and prodding the lpa and sending back results and stuff and that would be quite cumbersome so so we fairly quickly we realized that for this to be effective we need to have our own development hardware and and yeah we sketched out basically a block diagram how it would look where basically we take the lpa circuitry as it looks on the in the core computer take it over to a separate pcb and then we hook up all kinds of components like like a debug probe a usb to can device, USB to GBIO devices and current measurements and other things. And so what we end up with is a small circuit board with one USB cable and it costs a few hundred euros a piece. And we make ourselves, or at least Grepit, who are also hardware specialists, they. Yeah they basically did it in a just a few few number of weeks we had the first samples so so these so we have these in in some or at least big enough quantity that every developer can have one, at his or her desk and then we have a bunch of those in in ci nice which is based on soul, and soul ci where where we then so for every patch that we do it it then runs the whole test suite on the hardware.
Matthias
00:44:11
What's so special about Zuul? Because not many people might know it.
Julius
00:44:15
Zuul CI is quite interesting. So I would say its main selling point is that it can do speculative merging. And what that means is that you can have, because normally especially if you're multiple projects working in a common code base you can have, one team or one developer doing a change and it's working fine and all the tests are going through at the same time someone else is doing a different patch for something else and that is also fine and going through so both of them get merged but it turns out that those two together don't actually work so when both of them get merged it breaks, Zuul actually is able to speculatively check all the patches that are in flight and testing them together before they actually, get merged so so you have much higher likelihood of of the master branch actually working.
Matthias
00:45:19
Isn't it also true that we're dealing with embedded devices here i know roughly how that would work for a backend service say but for embedded devices i would be scared that the device would be in a weird limbo state or wasn't completely flashed when the test ran so how do you ensure that the tests are always correct and and start from a deterministic position good.
Julius
00:45:47
Question yeah so so we so the actual test framework is is robot framework which is a python based behavior driven development type of thing similar to cucumber or or yeah systems like that where we where the tests are basically on a system level, like you essentially tell statements about how the system should behave. And then you connect those statements to actual code that is being run on the target. And of course, every test suite then has quite extensive setup loop basically where it takes the device and... Flashes the correct version, resets it into the correct state and sets up whatever preconditions that need to be in place before the test starts. And then the same when it's done, it will do some cleanup. But that's all. So Zool doesn't care about that. It just starts the test framework and it does its magic and then it can report back.
Matthias
00:46:57
How long does a typical build take?
Julius
00:47:00
And so on the local machine, it takes about, depending on the machine, maybe between five and 10 minutes. And then the test itself, I think another 15, 20. But then in the CI, we're doing more extensive things. So that is around an hour or so.
Matthias
00:47:22
And that includes setting up all the devices and running the integration test.
Julius
00:47:28
Yeah. Yeah. yeah so yeah around an hour if it starts to start to take more than an hour we usually try to sit down and figure out what is if we can parallelize it more if we need to add more, units to to the ci so that we can run more in parallel or if we if we can optimize the test some also that they run does that include.
Matthias
00:47:50
All of the components or does it include just the rust components.
Julius
00:47:55
So it includes all the all of the lpa components yeah and.
Matthias
00:47:59
How often is rust the culprit of failing a build.
Julius
00:48:03
Almost never i mean that that is is since it's mostly taking care of at build time so you can't really you know if if it doesn't build you already know that in the build step and that's usually something you catch locally does.
Matthias
00:48:21
It mean you don't even flash to the device that often because you have a very iterative way of building the software with your IDE and with the type system.
Julius
00:48:30
Yeah, I definitely would think so that that, Compared to another system where less of the things were taken care of up front, you would absolutely need to flash it more often just to see that, no, that didn't work. So yeah, I would definitely think that that is the case.
Matthias
00:48:52
How big is the codebase, the Rust codebase?
Julius
00:48:56
So the Rust codebase is around 75,000 lines. And then we have additional 50,000 lines of system tests and CI and other tooling. So yeah, around 125,000, 30,000 lines.
Matthias
00:49:12
And that was developed by how many people? What's the team size?
Julius
00:49:16
So we've been ranging between five and ten developers overall during this three and a half years.
Matthias
00:49:27
Yeah, very nice. A very substantial code base, but also, I would say, still manageable. So I would assume a lot of work also went into specking out things, creating architecture diagrams, making sure that everything is wired up correctly, communication and so on. There's a lot of complexity introducing a completely new tool chain new language everything.
Julius
00:49:51
Absolutely absolutely and and and like i said earlier i didn't even go through the whole list of things that we have had to build out as well to get this get this working so, so yeah my my team has done an amazing amazing job there couldn't give them enough credit, and but but i've also found that, using rust or this setup that we have is super empowering for everyone so they so everyone feels quite confident when they're when they're working in the code base that and especially that you that you're not as afraid of breaking things because the compiler will actually you know let you know up front so so you know especially when we're onboarding new new developers to the team they are usually quite quick to get up to speed because they can hack around without fear because as soon as the bills pass and the tests pass and everything, then you're fairly certain that nothing weird has been done.
Matthias
00:51:01
How long until they are fully ramped up?
Julius
00:51:04
Everything from two weeks, which was the extreme case, to maybe three to six months, something like that.
Matthias
00:51:16
Yeah. So it's a bit of a myth that the Rust learning curve is a huge problem for people that want to work with Rust professionally.
Julius
00:51:27
I would say so. It's absolutely something to respect, and I would definitely, like we did, we we seeked help and we got some some people on board that were really already really really good at this i i wasn't you know i wasn't super proficient in rust when when we started i had done this project and some other smaller hobby things and and actually none of our regular developers are are i had had worked with rust in any big capacity before so we all picked it up on the job and and but of course they are experienced you know c and c++ programmers not not everyone even embedded so so they've had to learn that as well and.
Matthias
00:52:15
How did you learn how to write ideomatic rust do you have any coding guidelines did you talk to the people from on how to write ideomatic rust are there any resources that you maybe recommended to your team.
Julius
00:52:31
We read a lot of code from the community. We also use Clippy a lot to avoid the bigger issues. Rust format and Clippy, of course, are mandatory, and we use Clippy-petentic. And no warnings allowed, so to say. So that gives us a lot of pointers. But then, I mean, our code has evolved over time, going back to some things that we did in the beginning. You can absolutely see that there are some C-isms here and there.
Matthias
00:53:21
When I would be tasked with building code for cars, is I would be extremely defensive because I know that I wouldn't be able to just connect to the car or maybe have a shell connection and just troubleshoot on the live machine. These cars, they run for years and sometimes they have to go without maintenance or people don't do those updates that regularly as they should. So how defensive is the code? How, let's say, painstaking, Is it to work with inputs and outputs? How does the error handling story look like? Would it be normal Rust code that people could understand or is it very specific Rust code?
Julius
00:54:08
There are definitely exceptions, but I would say in general, it's very common Rust code. The robustness of just Rust in its default form is really high. So, yeah, you can't just unwrap all over the place. You have to ensure that errors are actually getting handled and logged properly. And we have a quite extensive, so we talked about this diagnostic stack, and we also have like a framework for creating diagnostic monitors that are monitoring different, conditions and and reporting that do.
Matthias
00:54:53
You have a custom panic handler.
Julius
00:54:54
Yes yes.
Matthias
00:54:58
That means failing is not an option you need to handle every condition because if you're the thing that boots things then. It boots. You if you fail.
Julius
00:55:09
Yeah and and of course yeah so if if it panics we need to reset back to to a working condition but we also use so the chip has a watchdog that we also need to activate so that in case it would lock up for some reason there the watchdog would would then you know hard reset it are.
Matthias
00:55:31
There any rules around allocation.
Julius
00:55:34
Yes and no of course there are there are rules about everything when it comes to automotive software in our case we don't use any allocation we don't have an allocator so we don't use so everything is static that.
Matthias
00:55:49
Means you disabled the allocator.
Julius
00:55:50
Yeah or yeah so we don't use that dependency at all and and the heap is zero size basically but that means that we use, yeah we use the stack a lot and we use heap less for a lot of these things but also So, yeah, we put our buffers and things. They are statically allocated.
Matthias
00:56:16
Because you know the sizes of the payloads that you have to handle at compile time.
Julius
00:56:21
Yeah, exactly.
Matthias
00:56:22
It's similar to how Oxide does it. They have this predictable scheduler, I would say, or part of the operating system, Hubris. And they do something similar where they only have certain message types that they can handle.
Julius
00:56:37
Yeah, exactly the same way. So everything is statically known beforehand or the worst case is known. So yeah, we never had to deal with any unknown buffer sizes. And then when it comes to input, you asked about input validation. That is, of course, super important. And that's why it's important also to have a well-defined, so that all messages that we are sending are of well-defined format so that you can easily, at the parsing stage, see if it's actually valid or not and reject it already there.
Matthias
00:57:12
If someone wants to buy a car that has rust components now, which cars would they need to buy? You said there are production cars. There's probably one Volvo and one Polestar, as far as I'm aware.
Julius
00:57:24
Yeah, that's true. It's the Polestar 3 and EX90, Volvo EX90. And then just keep on the lookout for spa 2 based cars and so the more will be, coming soon did.
Matthias
00:57:40
You have any issues in production.
Julius
00:57:42
With the with hardware or with well.
Matthias
00:57:46
Let's say both but i'm curious if you sold a car and then it came back because of a rust bug because of an issue in the rust code.
Julius
00:57:55
That has not happened yet is.
Matthias
00:57:58
That a common thing when you ship certain components and they are c-based for example that usually you covered most of the edge cases when things hit production or are there any rollbacks or changes.
Julius
00:58:11
Yeah i i can't really say how i would guess it's it's fairly common in in automotive in general i mean if you look at you know what's called warranty warranty in general and and things that are being replaced due to that that is quite often software that is at fault so so yeah i would definitely and and of course it's it's still early time so i'm not going to say that that is not going to happen for us but but so far it's it has we have seen some some manufacturing issues but those have been hardware related so far and and hopefully something that we we have or we'll be able to catch in the early stages so to say can.
Matthias
00:59:01
You give us an outlook into the future maybe one two three for five years down the road what is rust usage at volvo gonna look like.
Julius
00:59:13
I'm just gonna hop into my time machine and get back now i would of course love to see it, in more places so that is that is what i'm actively working on now to to find or to see where it would fit because it doesn't make sense to rewrite or replace, everything with rust because if when you're already when you already have the thing working yeah and and it actually it's tested and it's according to specification, it's seldom that it actually makes sense to rip it out but there are cases where that is justified especially when it comes to security critical like cyber security critical like user-facing code, things that need to do validation of, you know, data that is coming from the outside world, things like that. So I definitely see a possibility for those kind of use cases where we're interfacing with internet, for example, or things of that, Now that the hurdles I mentioned earlier, that's the hurdles that we had in the beginning. You know, we didn't have support for neither the hardware nor the OSs that are commonly used. But all of those hurdles are falling one by one.
Matthias
01:00:40
Now, I find it extremely surprising that a lot of the dependencies that I use for writing Rust and writing Rust in production is the same that you use for an embedded project at Volvo. And this is really cool because you can reuse the code. You can share ideas. You can share different bits and pieces. And the entire ecosystem becomes much better. so you kind of cross different boundaries with rust and how does it look like in comparison between dev and prod.
Julius
01:01:17
Yeah it's funny you say that because there is actually no difference we we build one binary and and that is the binary that we test on our test hardware and that is the binary that goes unmodified to the core computer. And so far, there's nothing that we can't actually test on our dev board, or at least very few things. There are, of course, few things that change when you're in the context of a whole car. But when it comes to the LPA functionality, we can test essentially everything before we deploy it. So it's almost unheard of that we get some sort of issues due to the fact that something was working differently on our test hardware. So, yeah, so one binary for no difference between dev and prod in that sense. But what you said with possibility to reuse, and that is definitely one of Rust's huge strong points that. The fact that you can actually you know take different components from the community and you can re use them with with confidence that you don't really have another i mean both hassle-free almost always it's just builds and works and you can just use it and and also there you know all these things about that you always have to take care of first you need to okay can you get it to build in my build system? And first time, probably not. You probably need to spend a lot of time to get that to work. But then when it actually builds and you start to use it, does it use memory in the same way as you do? Does it make the same assumptions? Like who is allocating this buffer? Who is freeing it? Who has the responsibility to do what? Those are not standardized in c and c++ meaning that yeah so even if you can build it it's no guarantee that you can actually use it in a you know in any productive way so but in rust on the other hand everything is maybe it's not true to say everything but but i mean you're you're pretty much guaranteed that it will work and and and and also like you said i i don't think of i don't think there's any language that is as scalable like from the lowest smallest microcontrollers to like backend systems or or whatever bigger biggest server farms that you can imagine and everything in between that.
Matthias
01:04:03
Sounds really empowering and probably very encouraging for a lot of people who might listen would that be your message to the Rust community?
Julius
01:04:14
Yeah empowering is definitely the key word that, We have been super, super happy with this project. And it really shows that Rust has a bright future. The hurdles that we mentioned before when we started the project and which made it basically a non-starter for most of the ECUs that we wanted to use it, those are all coming down or have come down already. So now you have QNX support, for example. you have Infineon Tri-Core support. There was recently an article on how to run Rust together with Autostar Classic. Autostar Classic is the common automotive software framework that most automotive software runs in. And now that is becoming available for Rust as well. There is work being done on various different automotive platforms and components. So for automotive, it's definitely a bright future. And we'll see where we'll take it at Volvo.
Matthias
01:05:31
And where can people learn more about this project, about Rust at Volvo?
Julius
01:05:38
Yeah, so there are two interviews out, actually. One from a couple of years back, where we were kind of in a starting position. And you can find that on the Volvo Tech blog. It's called Why Volvo Thinks You Should Have Rust in Your Car or something similar, silly pun. And then now recently there was an article, it's Tvericholf's blog, where we talked about, this this project and and how it has been going yeah.
Matthias
01:06:14
That was a really nice one i list i read this one and also this prompted me to reach out and where can people learn more about yourself.
Julius
01:06:25
You can find me on on linkedin and and on x mastodon all these different platforms i'm also I also was fortunate enough to join the Rust Safety Critical Consortium that was started last month at RustConf in Montreal. And together we are aiming to close the final hurdle, which is to make Rust a fully viable alternative to do safety critical software. So that is what we're doing now. and you can find more about that at the Rust Foundation webpage. And there's also a GitHub repo where you can join if you like. Admission is free.
Matthias
01:07:13
That means people will run into you at some point or another and we hope to see you at a conference or at an event speaking about Rust and Volvo or safety critical components.
Julius
01:07:25
Yeah, let's hope so. So we are at least super stoked about the current project and how well it has worked out. Definitely going to be pushing that forward within the company and hopefully we can find some other exciting avenues for it.
Matthias
01:07:48
Julius, thanks for taking the time and thanks for being an ambassador of Rust in the car manufacturing space.
Julius
01:07:57
Yeah, thanks for having me. It was a great chat.