Rust in Production

Matthias Endler

Helsing with Jon Gjengset

About keeping critical infrastructure secure and resilient with Rust

2026-04-23 93 min

Description & Show Notes

Jon Gjengset is one of the most recognizable names in the Rust community, the author of Rust for Rustaceans, a prolific live-streamer, and a long-time contributor to the Rust ecosystem. Today he works as a Principal Engineer at Helsing, a European defense company that has made Rust a foundational part of its engineering stack. Helsing builds safety-critical software for real-world defense applications, where correctness, performance, and reliability are non-negotiable. In this episode, Jon talks about what it means to build mission-critical systems in Rust, why Helsing bet on Rust from the start, and what lessons from his years of Rust education have shaped the way he writes and thinks about production code.

About Helsing

Founded in 2021, Helsing is a European defence company building AI-enabled software for some of the most demanding environments imaginable. Helsing's software runs where correctness is non-negotiable. That philosophy led them to Rust early on and they've leaned into it fully. From coordinate transforms to CRDT document stores to Protobuf package management, almost everything they build ends up being written in Rust.

About Jon Gjengset

Jon holds a PhD from MIT's PDOS group, where he built Noria, a high-performance streaming dataflow database, and later co-founded ReadySet to continue that work commercially. He then spent time building infrastructure at AWS, before joining Helsing as a Principal Engineer. Outside of his day job, he's been teaching Rust to the world through his livestreams and writing for years, which makes him a rare combination: someone who thinks deeply about both how to use Rust and how to explain it.

Links From The Episode


Official Links

Transcript

It's Rust in Production, a podcast about companies who use Rust to shape the future of infrastructure. My name is Matthias Endler from corrode and today I talk to Jon Gjengset from Helsing about keeping critical infrastructure secure and resilient with Rust. Unless you've been living under a rock, you know the guest of the show, but I will let him introduce himself. Jon, happy to have you.
Jon
00:00:29
Thanks, Matthias. So I'm Jon Gjengset. You may know me as @jonhoo online, otherwise on the various channels, although YouTube I think is the primary one. And I guess, who am I? So professionally, I work at a company called Helsing, which I guess we'll talk more about today. I work as a principal engineer, which means I end up jumping all over the stack, owning wherever the biggest areas of attention are and where I can be the highest leverage. So I'm not necessarily in one particular team persistently over time, but rather seeking out where I'm needed. I've worked there for the past three years. Before that, I worked at Amazon. I maintained and built their Rust build infrastructure. And then before that, I did a PhD at MIT where I built a sort of novel distributed systems database project that has since turned into a startup called ReadySet. That's sort of the professional side of my career. And then the thing I'm more widely known for in the Rust community is the Rust educational materials that I make, primarily in the forms of videos on YouTube, where I do long form content on building things from scratch in Rust and showing people the actual code as we develop it and trying to give you know a realistic view of the the the actual development workflow like what does it look like when you go from zero to one on a code base in rust.
Matthias
00:01:51
Yes and that's exactly the reason why i was super excited to have you on the show to do this interview and i guess i can speak for everyone listening right now when i say thanks so much for all the content for educating all of us you're probably one of the most famous rustaceans out there and overall from what i can tell an amazing human being always happy to share
Jon
00:02:15
I appreciate that thank you!
Matthias
00:02:19
And a lot of people know you from this educational context, but maybe not everyone knows that you're a principal engineer at Helsing. So talk a little bit about your role. You mentioned it already, but maybe you can get into the details. What do you do there and what is Helsing's responsibility right now?
Jon
00:02:40
Sure. So Helsing is a defense company based in Europe. They started fairly recently. So it's only, I want to say, four or five years old. I forget the exact start date, but around there. I joined in... Sort of towards the end of, or middle to end of 2023, after I moved back to Europe. And Helsing operates in sort of across the entire defense spectrum in Europe, but they're specifically focused on software-enabled defense. So how can we use software as the primary driver for, you know, keeping up the deterrence capabilities of especially democracies and especially focused on on Europe, given that's where the company is based as well. It's already become quite a large company. So in the sense that, you know, we're now somewhere in the vicinity of a thousand employees we operate in, or we have offices in, I think, six different countries now. So we have offices in Estonia and Poland, in the UK, Germany, France. Am I missing any? I think that those are the sort of primary office locations we have. And then obviously, you know, I work remotely from Norway for the company and we have other people working elsewhere as well. But those are the sort of primary locations. But for a company that's only four to five years old, that's quite the growth. And the, you know, the reality is that that is partially because of the world situation and the way things are developing in Europe, where a pretty, you know, a pretty severe investment into European defense has been needed. And Helsing has sort of seen fit to try to fill at least some of that space. And so we operate across land, air, maritime. We recently announced some work in space. And so the observation is that by focusing on the software, there's a lot we can bring to the table and we can bring it to the table pretty rapidly because software in general has more rapid development cycles than traditional hardware does. And so, you know, I, in my work at Helsing, I've been primarily working in the air domain. So working with, for example, the capability upgrades for the Eurofighter, which is a European jet fighter program. And currently I'm working more on the CA-1, which is our recently introduced product that is essentially an autonomous UAV. And so I'm working on building a lot of the software that's going to underpin that entire stack.
Matthias
00:05:11
If you compare Helsing's usage of Rust with AWS, can you see any differences?
Jon
00:05:21
Yeah, I mean, I think there are a number of differences, actually. One of them is that Helsing was... A sort of rust first company so the it was very early on decided that the the entire stack there should be rust based and and should be you know wherever possible rust would be the language of choice and we get we can get into exactly why that was but but that has sort of shaped a lot of the software engineering it shaped a lot of the software that we've built and the way that we built software whereas at amazon you know amazon was originally a sort of java company and, some healthy amounts of Perl in there. And then over time, it's sort of grown into this polyglot company where there are lots of different languages in use in different parts of the company. And Rust is obviously one that's, I think it's beyond up-and-coming now. It's actually being adopted in pretty serious use cases across AWS in particular. But Rust was the sort of up-and-coming competitor. It was not the incumbent. and that changes how you adopt it right because it means that at amazon there's a lot of infrastructure there's a lot of tooling that's not built for rust that was built with other languages in mind and where rust now has to make inroads into that ecosystem and integrate with all the things that people are used to working with whereas the tell thing we can build everything specifically for using rust because that is the the primary target language yeah.
Matthias
00:06:49
What i get from this is aws had infrastructure before and it was sort of a brownfield adoption of rust whereas at Helsing it more or less might have been a greenfield adoption i'm not sure if that is true.
Jon
00:07:04
Yeah i think that's right i mean it was a is a pretty principled choice from the beginning of the company to say you know we're going to be building technology where you know a lot of the technology is critical right it's like it it it ends up having implications for life and death decisions it ends up having implications for you know, the correctness in in the military domain like messing things up here is very very costly and i don't mean in terms of monetary cost right it's just costly in like a human cost and so as a result you need to make sure that you build systems that are highly resilient highly reliable highly predictable and and robust and rust is one of the mechanisms that we wanted to use from very early on to make that be the case. And, you know, because the company is relatively young, we could also then say we're going to take the attitude here of everything is going to be Rust from the get-go rather than say, you know, let's just try out different languages and over time figure out what to choose. We're just going to say that is what the stack is going to be built in. So it's very much a greenfield approach in that way.
Matthias
00:08:17
And yet, even if you base everything on Rust, you still need to interface with existing libraries that might not be written in Rust. You might drive controllers or other hardware that maybe has firmware that isn't written in Rust. How is that story like?
Jon
00:08:34
Yeah, and I actually think that's one of the reasons why Rust was able to gain such adoption in such a wide set of areas is because its interoperability story is decently good. There are things I would still like to see Rust grow on here for sure, but it gives you low enough control to be able to write firmware, operating systems, embedded devices in it. But it also makes it relatively easy to plug Rust into existing code bases, whether that is underneath something like Python. Python cryptography is a good example here where they are now using Rust under the hood for some of the native components, but also being able to put Rust above other things where you have an existing C code base or even C++ code base, and you want to be able to interface with it from Rust. Well, you can also do that. So you can sandwich Rust into wherever in the stack you feel like it's appropriate, and then it can grow out from there. And that's much harder with some other languages. Like if you have a runtime, for example, and the being able to do foreign function interfaces either above or below you tends to be more painful.
Matthias
00:09:46
Did you see cases where vendors provide Rust SDKs now that it's becoming more popular?
Jon
00:09:51
It's a mix. There are some domains where we're seeing more interest in supporting Rust from industry vendors. And then there's others where it's still, you know, all like all C++ or you'll see places like the, if you look at the Terraform Kubernetes ecosystem, a lot of that is Go and they're not really planning or they're not. They don't seem particularly interested in saying we're going to also provide Rust bindings now. And then you have, you know, more embedded ecosystems where maybe they already have an existing sort of stack for development that's C-based maybe, and they don't necessarily want to change that. But I do think we're seeing vendors now think more about maybe a Rust SDK is also something we want to provide. I mean, Amazon is a good example where they, you know, now have a Rust SDK. And it was because there was sufficient demand for, you know, we want to build things that use AWS from Rust. So please let us do that.
Matthias
00:10:48
Maybe for some additional context, could you list the things where Rust is front and center at Helsing? What makes the stack?
Jon
00:10:57
So Rust is used throughout our entire stack, essentially. So we use it for obviously anything that's backend services and the like, but we also use it for anything that's close to edge devices. So if you are writing code that's going to run on a UAV, on a drone, on an underwater autonomous submersible, whatever it might be, realistically, you have to use a language there where you have fairly tight control over things like power usage, performance, predictable performance over time, low overhead compute, and hardware control. And so we use it a lot there in sort of embedded or bare metal systems, but also things that are almost embedded or bare metal, right? So things like, you know, essentially single application, but still runs on Linux, but on a very particular compute board would still be something where we write those applications in Rust. We use it for a lot of our tooling is built in Rust a lot of our networking technologies are built in Rust in fact it might be easier to sort of list the things that are not done in Rust which I'd say, there's primarily three categories so there's. Web frontends, we tend to build in TypeScript instead, because you can do some of it in WebAssembly, but realistically, TypeScript is where you're going to get the most mileage here. And then for AI research, like for the people actually working on the underlying machine learning algorithms and training and the like, there we try to enable people to use Python because that is where they're the most productive. That's where a lot of the state-of-the-art research happens. And that's where a lot of the existing tooling and libraries and such exist. And so we don't really want to move all of that to Rust, even though we think it's probably feasible. It's not clear that it gives the bang for the buck in that area. And then the last is we have some things around infrastructure that's in Go I mean I mentioned Terraform and Kubernetes and stuff so there's like some stuff like that where, Go has the best support ecosystem for writing things in that environment but I'd say that's a that's a pretty small minority I'd say sort of in in order of in order of adoption percentage I think it's Rust by far and then Python and then TypeScript and then there's like a tiny bit of Go there and then there's like There's always Bash because CI and everything, but I'd say those are the primary languages.
Matthias
00:13:29
According to our internal statistics, a large percentage of our listeners do use Rust in production in some way. But what does it feel like to work in a codebase where Rust is truly the answer everywhere, not just in one layer?
Jon
00:13:45
It's very convenient, right? Because it means that whenever I go to any codebase across the company, chances are I know how to read that codebase. Chances are it's written in Rust. It's not just about being able to read the code. It's also the structure is kind of predictable because they're all cargo projects. They all have, you know, the layout you would expect from that. It also means we can write tooling that is specifically designed for Rust projects and then, you know, add support for Python, for TypeScript. But we can build things specifically for, as an example, we have an internal linter that we use to catch things that are more company preferences rather. So we run Clippy, and then we also run this other thing that tries to lint for things that are preferences we have for software engineering in Rust. And so this can be things like looking for preferred libraries we have for things like logging, for example. Or it can be things like how we believe you should take internal dependencies, like how they should be expressed in your Cargo.toml. Or it can be things like, you know, how we think you should be using expect, like the sentence structure we believe you should have inside of expect statements. Like there's a bunch of that sort of stuff that you can do linting that I don't think necessarily makes sense for the, like, it wouldn't make sense for upstreaming into Clippy, but it does make sense for enforcing things internally. And the fewer languages you have, the more you can build tooling specifically for those languages to encourage the sort of software engineering excellence practices you want to have for that language.
Matthias
00:15:20
Yeah, I'm sure a lot of people will be interested in taking a peek at that. So please open source it if you can, even if you don't want to upstream it.
Jon
00:15:30
I think it's mostly uninteresting in the sense that Clippy already has a relatively straightforward mechanism where you can basically implement your own lints. And so our internal tool is just that. It's just a Clippy, but where all the lints are replaced with things we decided we wanted to look for. And where there are things that we think are actually useful to Rust users more broadly, then we would obviously open source those or upstream those into Clippy itself. But for a lot of the others that are just like encoding of our software quality standards, it's like, I don't actually think they're all that interesting outside of the company. And there's certainly nothing about the linting engine itself that's interesting, right? Because it is just the clippy tooling that exists that we've added our own rules to, if you will.
Matthias
00:16:23
Which sort of begs the question, do you use that tool across the stack? Or do you make differences between embedded developers and backend developers? Do they use Rust differently?
Jon
00:16:34
No, that tool is used across the stack. It's not currently mandated. So it's a thing that you can choose to add to your CI. And then we are strongly encouraging everyone to use it in their CI. But most of the rules here are around things that are good practice, no matter where in the stack you operate. This can be things like if you have an error type that is, let's say, error result, and you are propagating that error type using the question mark operator, you should have a call to dot context before it. Right right and that makes sense for for any code base you're operating in the same thing like we can include encode rules like always prefer dot context over dot wrap error or vice versa right but we can at least encode that we believe it should always be one or always be the other, and so it's like those kinds of things more so than you know it's harder to write a lint for i don't know how you should handle back pressure in an application which is the kind of thing that would differ between embedded development and cloud development you can't really write a lint for that at the clippy level it's more it's almost more architectural and so those things are not things we lint for do.
Matthias
00:17:47
You have an example for when dot context is helpful.
Jon
00:17:50
Oh i mean i love dot context i use it everywhere the the thing that context gives you is the ability to you know when you propagate an error up through the stack that error might just be something like permission denied or not found or something and if you just propagate it with question mark then at the point where you emit the error you know in your main in your binary or something you would just get an error printed that says file not found or permission denied and that's completely useless and if by adding context you can include not just information about like which file was being accessed but why was that file being accessed so imagine things like you have a imagine you have config files that have include directives. And so you might be like three levels down in an include hierarchy, and then you reach a file you're not allowed to read. Well, then the question is, which file included the one that said that you should read this file? So even if you had the file name without the context, the file name might be like, well, I don't know why foo.bar is being included. That's not any of the files that are listed in my top level config. And so the context allows you to give. Both things like data context, but also programmatic context, like why are we loading a config file here in the first place? Or in which part of the application did we load this configuration? Like was it, you know, I think that's loaded at startup, or maybe it's like the code that handles configuration changes at runtime. They both end up reading the config files, but which of them failed with this error? And so in general, adding context like this is very, very helpful for making your errors actually be actionable where they are where they're emitted yeah.
Matthias
00:19:27
And it also works across different crates even in the same namespace or outside.
Jon
00:19:33
Um well i mean context is just the the way this is set up is if you use anyhow or you use eyre or you use miette and you have this sort of opaque error type dot context allows you to a take such an opaque error type and turn it into another opaque error that has the original one plus this additional context as a part of the chain. But it also lets you take an error that is not one of these opaque errors, like one that comes out of some external library, and then chain this context on top of that error and turn it into one of the opaque errors. It obviously doesn't work so well if you have enumerated errors. So if you have ones that are like, I have an error enum, and these are the variants, then you can't easily call dot context on that unless you're willing to erase the the concrete type of that error and turn it into an opaque error like an anyhow error.
Matthias
00:20:25
How high is the code reuse at Helsing
Jon
00:20:29
I think it depends on where in the stack you look, right? So we have projects that span from relatively new sort of experimental, we're trying something out, to this is in a product and has been in development for several years. And the amount of code reuse you do in one versus the other is pretty significant. It's also the amount of code reuse you do within the domain versus across domains varies. So there are more commonalities between something like a UAV and a strike drone. Those have a lot more similar components than, let's say, a radar analysis system and a submersible. Those don't share as much in common. So it's hard to give you one number for the amount of reuse. But we do try to lean pretty heavily into if you've built something that is useful elsewhere at the company or indeed outside of the company, then build it as a reusable library. And so, you know, you've already seen some of this come out on the public side of Helsing. So we have some open source repositories like we have a tool called buffrs, which is basically a package manager for protobuf files. So it lets you take a collection of protobuf files, create basically a proto.toml, which is equivalent to a cargo.toml that says, this is the version, this is the package name, and you can take dependencies on other protos and it resolves those and runs proto and gives you the flattened set of all the files and resolves the dependencies between them. We have a tool called Sguaba, which is a library for doing spatial math, like rigid body dynamics or rigid body transformations in a type-safe manner. And this is something that we use across a lot of our different code bases where it's a real pain to get them right once, and you don't want every team to have to get them right again and again and again. So we build it once, and then we make that be a central library, a central utility that every team can make use of, and then we also open source it. And then we have other versions of this internally, like we have some tooling for the Avro ecosystem where anyone who uses that internally now has access to our tooling. And some of that we might open source, some of it we've already open sourced. And so I'd say there's a pretty broad swath of reuse that's pretty intentional. It's something where we see that there's a lot of cost to telling every team to reinvent the wheel. And it's cost not just in terms of engineering time, but also in terms of correctness, right? It means that you only manifest the bug once, you only fix it once, rather than every team having to wrangle with the same complexities and the same bugs over time.
Matthias
00:23:22
Yeah, that's also what I like about Rust in general, because undefined behavior is a thing that you can centralize and then fix once, and then the entire ecosystem profits from the fix.
Jon
00:23:33
Yeah and i mean this is partially a property of just having a good packaging system right like having a package manager like cargo means it's pretty easy to turn something into a library it's pretty easy to take a dependency on it and so that incentivizes doing this kind of of sharing which would be more annoying and at least some other language ecosystems yeah.
Matthias
00:23:56
Also, thanks a lot. I always like it when companies open source their work. It's super amazing. And Squabble is certainly an amazing library too. You gave a talk about it. We will link to it in the show notes. And can you allude a little bit more to buffers? Why is it important to have a package manager for protobuf definitions? You must really like protobuf definitions and probably use them everywhere.
Jon
00:24:23
Well, so it comes as a pretty natural outcome of not having a monorepo, right? Because you end up with different teams having protobuf files for their configuration or their data exchange or whatever it might be. And then you have some other team that also wants to make use of that team's protobuf files, well then either you need to like be in the same repository as them or you need to copy paste the files or you need to have a mechanism for publishing your protofiles and then getting them into another codebase, at which point you need versioning because you're not tagging a particular commit of it, you need the version of it. And thus you start getting transitive dependencies where maybe there's a shared definition for maybe just data types even, so not full structure of gRPCs, just some of the core data types. And you want those to be shared across two different repositories inside of one team. And then they both take your dependencies on that. And then anyone who takes a dependency on either now also needs the transitive dependency. And so you very quickly run into the situation of we basically need a package manager here and we need versioning, we need packages. And that's what Buffers was built to solve.
Matthias
00:25:36
What do you use protobuf for internally?
Jon
00:25:40
So we tend to like having sort of textual code representations of protocols because it makes it a lot easier to decouple the two sides of any given communication pattern. And, you know, protobuf is one way to do that. Avro is another, and there exist many others as well. It also means that it's easier to, Once you create that protocol divide, and if you encode the protocol separately from any given code base, it now makes it easier as well to change the technology choices on either side, or just completely reinvent one side by rebuilding it from scratch, but being able to reuse the definitions. Why Protobuffer specifically? Specifically, it's a pretty mature toolchain and ecosystem, and it has a proven track record of working well, being efficient, having support for many languages. And for that reason, I think it's a pretty obvious default choice. But as you'll see, or as you might have seen from my streams, for example, there are things where we use Avro instead of protobuf. And the rationale for that is there are some features that Avro has that Protobuf does not, where if for your particular use case, those features are worthwhile, well, then you should pick the technology that has those. So as an example, Avro support for this thing called logical types, where you can annotate particular fields of a type with, you know, this is a F64, but I'm going to annotate it with the logical type velocity in meters per second. And protobuf doesn't really have that mechanism for adding sort of richness to the the types of fields there's like you you can kind of hack your way there in protobuf but in abro that's just directly supported by the tooling it also has better support for streams of blobs in protobuf you don't really have that in protobuf you have it in gRPC but we don't necessarily use these protocol definitions for RPC mechanisms. We often use them to represent just the data format that's being exchanged in various different formats and protocols. And so just because we're using protobuf does not mean we're using gRPC everywhere. Same thing for Avro. We might be using the Avro IDL for expressing data types for protocols. We're not necessarily using the RPC mechanisms that are built on top of Avro.
Matthias
00:28:03
Is it time to write an Avro package manager now?
Jon
00:28:07
Maybe. It's not impossible. I mean, running a package manager, if it is as simple as, you know, assign a version, create a bundle, upload it somewhere, is not that bad. And I mean, if you look at buffers, you know, there is some amount of complexity there, but it's not, you know, monumental. And we could probably pretty easily take buffers and create an Avro version of buffers. The thing where it starts to get really gnarly is when you want sophisticated semantic versioning resolution, for example. There's been some really cool work on... There's this effort called PubGrub, which is essentially trying to write a version resolver that can be reused across packaging ecosystems, across different types of both specifier semantics in your dependency thing, thing, but also write it in such a way that it's reusable. So you could use it in NPM, you could use it in cargo, you could use it in whatever package manager you dream of, and just get really rich resolution of package dependencies, because it turns out that's a fairly complicated problem. And then PubGrab also tries to give you good error messages from resolver failures, which tends to be something that many package managers struggle with, because they build version resolution in this kind of ad hoc way. And with PubGrub, Cargo hasn't adopted PubGrub yet, but it's sort of on the long-term roadmap. And it has been for a while, so don't hold your breath. But it does mean that, for example, buffers as of today only has exact version lookups. So if you take a dependency on 1.2.3 and 1.2.4 is released, then buffers will not pick it up. But if we integrated PubGrub into buffers, we would get this version resolution behavior that we would say, the default should be semantic version like caret matching like what Cargo does. And at that point, moving to say Buffer's version that supports Avro feels like it shouldn't be that bad because there's not too much that is protobuf specific inside of Buffer's. It is more like a collection of files with some metadata that annotates version and dependencies and then being able to bundle those up into tarballs that you can publish somewhere. And if you squint at it, that's like most of what many package managers are.
Matthias
00:30:34
Okay, so we understand that management of those protobuf definitions is really important for housing. But what do you use it in the first place?
Jon
00:30:43
So we do use gRPC for some parts of the tech stack, although gRPC tends to be best suited for environments where communication is fairly reliable and predictable. And you have like TCP, you have stable IP networks and the like. So there we use gRPC, and then protobuf is a good way to make use of it. But protobuf is also useful for just describing data formats. Data formats is the wrong word, but like... The structure of messages that are going to go over a network or in some cases that will go to disk, although protobuf tends to be less well-suited for that, but certainly for anything that has to go over a network, but it doesn't necessarily need to go over gRPC. So for example, there are especially closer to the edge networks that we have where you have drones flying around where there's radios with very limited bandwidth, connections that come and go you get jammed so you lose connectivity or your bandwidth gets severely reduced because you're flying through a zone where there's a lot of interference whatever it might be in those environments gRPC is not going to work like realistically you can't use tcp it will like the the network is simply too poor and too dynamic for that to work plus you want to make use of things like you know if you have a radio network it's inherently broadcast and so you send one packet, you want to make it useful to as many other peers in your network as possible. And so that's an example where we're not using gRPC, we're not using TCP. And in fact, we've built our own stack that is reliant on CRDTs, on conflict-free replicated data types, that tries to build a distributed network where you can still get reliable exchange of information, even in the presence of severe packet loss, network reordering, packet reordering, node sending, frequent updates over time, and you want to make sure you only accumulate the updates that are newer, that old data gets erased when new data replaces it, even if you get the packets out of order. Like this sort of very... Traditional distributed systems when they're not in a cloud environment setting. But there, for the CRDT stuff we've built, we're still using protobuf for the actual data definitions. So the definitions of the schema, effectively, of what an application, what data an application might write, might read, what gets sort of stored and sent, all of that is still in protobuf files because, again, it's something that people know. The tooling is good around it. You know, it has editor support and all of that stuff. We have buffers for being able to give you versioning over those schemas. And so it makes a lot of sense to reuse that ability to describe your protocols, your data definitions, even though the underlying exchange technology is very different.
Matthias
00:33:35
And what are reasons for using CRDTs specifically? You mentioned a few things already, but what's the bird's eye view? When would you use it? when wouldn't you use it and maybe what is it in the first place.
Jon
00:33:49
Yeah so so crdts are at the the very basic level algorithms or it's a it's a an exchange data type so it's a it's a data type that comes with some algorithms that define what to do when these messages are exchanged over a network or between peers in whatever way they are, such that if. Imagine you have set up a distributed system where you have, let's say, three nodes in that system, nodes A, B, and C, and node A and B concurrently make edits to some underlying document or whatever. Let's use document as a good example. The shared document between A, B, and C, A is making edits to the document, B is making edits to the document, and they can't talk to each other, but they're able to send packets to C. C is now going to observe the changes from both A and B. how does it reconcile the changes from A and B? And CRDTs are the data type plus algorithms that make C able to reconcile those changes in such a way that if A and B's edits were not conflicting with each other, then there is no conflict observed by C. It just observed both edits. And if they do conflict with each other, then C has a conflict resolution strategy in the sense that it can detect that there was a conflict and it can decide what to do about that conflict in such a way that afterwards there is no conflict. And so imagine for example that your document is like a key value store if a writes to key foo and b writes to key bar then all the edits to foo and all the edits to bar should just come into c and there should be no problem if a deletes foo and b updates foo so the same key and then they send to c then c is now going to have to decide what to do when it observes a delete and an update at the same time. And the CRDT would be something like, you know, it both informs you about the metadata you need to add to the packets to detect that these are the same key and they happened concurrently with each other rather than, let's say, the update happened first and the delete happened after, in which case you should always take the delete. But if they actually happen concurrently, which the CRDT metadata will tell you, then the CRDT algorithm will tell C whether it should prefer keeping the updated value or prefer the delete and the sort of baked into the data type itself, how to resolve that conflict.
Matthias
00:36:24
And is that resolution always the same, or does it depend on the use case? Can you configure the resolution?
Jon
00:36:30
Yes, you can choose different CRDTs depending on the outcomes that you want. So for maps, for example, the most common way to construct a map is something called an observe removed map, where you can only remove items or updates that you have observed. So in the case before of A, updates foo, and B, deletes foo, I think I said it in reverse, but it doesn't matter, then because B did not observe the update to A, it is not allowed to remove the update to A. And therefore, the update to A will win out. And the result will be that foo will not be deleted in the resulting resolution. And the protocol and the algorithms and the metadata ensure that this is always the case. But you can choose, there's a different CRDT that is not the observed removed map that allows you, I forget the name of this one, but there's a CRDT for maps that is specifically a, like removes wins, where you are guaranteed that if you have a remove and an update, the update will be removed. And so these are different semantics you can choose by choosing the appropriate CRDTs. And the CRDTs tend to also be composable. So you can say that, you know, you have a key value map where the values themselves are also CRDTs of a particular type. And you can structure them in this way where you choose the semantics at every layer by choosing the appropriate crdt mechanism and the the rule of crdts is that as long as you observe all the same operations as another node in the system you agree on the final state so they're guaranteed to be sort of commutative and associative to the point where eventually everyone has the same state as long as they get to exchange all messages.
Matthias
00:38:15
I sort of have immediate follow-up questions, two of them. The first one would be... How do you decide on such data structures? Is there a team meeting and then someone kind of knows algorithms really well, algorithms and data structures and proposes that? And maybe there might be competing algorithms that maybe you considered. And have you been there when the decision was made?
Jon
00:38:40
Yeah, so our CRDT distributed system I built. And part of that was out of a conviction that. GRPC is not something you can run on the edge network so we need something else and the question then becomes what is the something else and you know i happen to have a decent amount of background in building distributed systems and so i've had a decent idea of what the different options were and crdts felt like they fit this particular set of use cases or at least the use cases we could predict we would we're going to have quite well there are other designs you could come up with here, but they come with a different set of trade-offs. I think at the time, I went into it with a conviction that this is the right way. As you build more and more complex and compounding stacks, you start having to document these decisions as well so that the motivation and the rationale for making the choice is not lost to time. And that's where you end up writing architecture decision records like ADRs, where you write down, here's the problem statement, Here's the decision we made for what algorithm to use. Here are the options we considered and why we decided to discard them. So that for the future, people have insight into the decisions you made. And also, you know, the process of writing this document becomes the way you take the decision is you convince the group as part of writing this document that all the options should be discarded except for the one that you choose.
Matthias
00:40:11
And I'm assuming there's still ongoing research on CRDTs. Would you amend any of the prior decisions now if you had the chance?
Jon
00:40:21
I don't think so. I think the CRDT design has actually worked out very, very well. And, you know, when we implemented the CRDTs in the first place, it was built off of a very recently released paper. So we started doing that implementation in... Shortly after I joined, actually, so this is sort of end of 2023, and we were implementing it based on a paper called DSON, which was released, I want to say 2022. So it was very much sort of bleeding edge technology already when we started implementing it. And then as part of implementing that set of data structures and algorithms and protocols, we also effectively extended the research into things that were needed for the operational use cases and production use cases we had in mind.
Matthias
00:41:07
Sounds like a decent paper and.
Jon
00:41:10
Maybe i mean we did actually go a lot of back and forth with the authors of the paper we've also open sourced the the core of that implementation so there's a on crates.io there's a crate called decent dson that is the core of that crdt and that compound set of crdts precisely because we think this might be useful to other people and because we think we made, you know, innovations in the space that, you know, were not in the decent paper. And some of this was purely because in order to put this into production, we had to do a lot of both optimization, but also debugging to make sure it's extremely reliable and fast. And as part of that found corner cases that like in the paper were either handled in, you know, suboptimal ways, or in fact, we found some bugs, at least in their prototypical implementation, not necessarily in the algorithms that we wanted to correct. And therefore also wanted to publish.
Matthias
00:42:03
And when you did the implementation and you found those bugs, did the Rust type system surface them easily?
Jon
00:42:11
Yes and no. There were some where the Rust type system was helpful. So the original DSON paper was accompanied by a research prototype written in JavaScript. And when writing the sort of rust encoding of that same algorithm there were definitely places where it pointed out that they had relied on the javascript type system being fairly forgiving where they were just sort of interchangeably using two different types that just like really should not be mixed because you can very easily get into bugs that way and i think we found like one or two bugs in the in the research prototype again not necessarily in the algorithm but in the research prototype that were because of this. But I think decent amount of the sort of nuances of the algorithms, especially when it comes to performance, for example, are not things that would be caused by the Rust type system as much as they're caught by a lot of testing, like property-based testing, fuss testing, and doing a lot of performance benchmarks and tracking down the root causes. They're like things that are hard to catch in the type system.
Matthias
00:43:26
Now when you look at all of these things that you've implemented so far and there's certainly a lot of amazing things in there I also want to check out your CRDT implementation now, what would you say was the biggest learning since you started modeling logic in Rust how do you model your types nowadays things that people can learn and apply in their own work.
Jon
00:43:50
I think there's a trade-off that you learn over time of where is it worthwhile introducing more types and where is it not? Where is it worthwhile making things generic versus where is it not? I don't think there's a hard and fast rule that you can just always follow. But over time, you develop a sort of intuition for this doesn't feel like a good use of a type or this feels like it would be dangerous if I don't add a type. Like you can start to sort of predict the bugs that people will make if you don't introduce a new type. And you also, whenever you introduce a new type, you like feel a tingling in your hands about the amount of pain you just introduced to people using it because now there's an extra type or previously that it didn't need to be. And so I think the thing I've learned, and I think this is not a, it's not a moment in time learning. It's sort of a lesson over time is to better tune that trade-off. And I think one of the observations I've come to is. Representing things in the type system is extremely valuable, and people don't do it enough. But you have to balance it against the pain of using the library that you've developed that has all of these types. And you need to really keep in mind what am I actually gaining in terms of the safety I add to the system when I introduce all these types and what is the cost of the people using it and the way you do that is by making sure that when you write for example a library like Sguaba for example for the rigid body transformations also write code that uses that library while you are writing this type states library because it will immediately show you just how painful the consuming code ends up and that will help guide your way into, okay, maybe I don't need a dedicated type for this. So an example here maybe is in Sguaba, we have a type for WGS84, which is basically GPS coordinates. So latitude, longitude, and altitude. Now it turns out that WGS84 is actually a moving standard. It's a moving standard because the parameters. Earth changes over the course of time like you get you have you know plate drift and drift of the magnetic north pole and like they also update the reference ellipsoid sometimes when they get better estimates of like the roundness of the earth and so as a result there's not actually one wgs84 there's like multiple over time and if you capture a coordinate right now and then in within 10 years. You try to take the same coordinate and plot it on a map, then it wouldn't be in the same place as where the original one was. So if you read out the GPS coordinates of the Eiffel Tower, the actual GPS coordinates of the Eiffel Tower will change over time. But they'll change very little, usually. But it doesn't mean that technically, like you kind of want the type system to represent the point in time at which this measurement was taken the more extreme case here is so you have this thing called local tangent planes which is basically imagine a plane is flying and it has a some gps coordinate and then you want to know you know the the relative location to the plane so something like either in front right down coordinates or north east down coordinates so it's like this thing is one kilometer north three kilometers east and 500 meters up from me then that is a it's a local tangent plane to the current location of the plane so i record that coordinate but now the plane moves then you kind of want to represent the fact that this was a coordinate relative to that plane's position at this point in time. The reality is if we actually tried to encode that information in the type system of Sguaba, it would be impossible to use because every type would be distinct. You just like, there would be no easy way to move between different coordinate systems. You would end up with these like, WGS84 would have like three different generic type parameters that are like involve time. And then how does that translate into other coordinate systems that involve time? And it would be more accurate, it, but it also would be much more painful to use, and it's not entirely clear that it eliminates very many classes of bugs. Quite to the contrary, it might introduce more, because now if you get the wrong time bases, then nothing compiles, and then you pull shortcuts to try to get out of the mess of errors you end up with. And so for Sguaba, I made the explicit choice to say, this library will not represent time in the type system. And that does reduce the safety you get from the type system, but it also makes the library much more pleasant to use, which increases the number of people who will use it, and therefore overall increases safety compared to if I had put it in.
Matthias
00:48:52
Is that an example of the conflict between ergonomics and correctness?
Jon
00:48:57
I think that's right. And, you know, it's like, I don't actually think it's sacrificing correctness, it's sacrificing precision. And precision can be useful for correctness, but you can also have correctness without that precision, right? So you can write correct programs that don't have time represented in the type system. It just means that there's some classes of bugs that you don't get to eliminate through the type system. but that doesn't mean you end up not having correctness yeah.
Matthias
00:49:32
So from your lens it's still correct because it encodes everything that you want the type system to encode but you just explicitly leave out a thing that you don't want to be so precise on yeah.
Jon
00:49:44
And where where i think you know i'm going to leave it to the users of this library to get that part correct.
Matthias
00:49:51
Do you make a difference between library code and binary code? Do you write different code if you had to write application-level Rust versus library-level Rust?
Jon
00:50:00
Yeah, I think I do in the sense that when I write library-level Rust, I think a lot more about the programmatic API that I present. So that includes not just documentation, right? Obviously, you need to write good documentation for a library to be useful, but also the structure of that API. What are the backwards compatibility hazards? Where do I think I might put myself into a trap when it comes to breaking changes down the line? So I want to be conservative about what things I expose in the public API because those are things I can't change later unless I do a breaking release. The way that you propagate errors might be different because you might want library consumers to have a better ability to deconstruct the error and figure out the origin. Whereas in a binary, usually what you want is to present a chain of errors to the user that results in something actionable on their end to fix the problem. And so I do think the design ends up somewhat different. I don't think it changes the internal writing of the code very much, but it changes how much focus you put on the external API. But for a binary, of course, you have to think about what does the command line interface look like for that binary and so that also requires thought but it's it's a different kind of design process earlier.
Matthias
00:51:22
You mentioned that you also want to write the application level code that goes along with the library code or actually vice versa does that mean you start with a main.rs model out your types and then gradually move them into library crate.
Jon
00:51:39
It can i do do that sometimes as well but more commonly it means i already have at least two code bases that, have a need for this library. And so I'm going to create the library and see how it affects those two codebases. Because then I have a real set of use cases that I can test out how the library feels. And I mean, this was the case for Sguaba, for instance, was we had codebases internally that had to do this kind of spatial math. And they already had code for it, like they were working code bases but the that code was like hard to review brittle had a bunch of like magic constants in there and so it didn't it felt like you know we've had to solve this problem at least twice so we should turn it into a reusable library that is you know well designed well tested and because it would be recommended going forward and so that is what informs the the the design of the library is the the the evident need from the things you've you've already built when.
Matthias
00:52:44
You build squab up what was your testing strategy was it based on unit tests integration tests or or property-based testing or fuzzing.
Jon
00:52:51
It's it's all of the above so the there's both a bunch of unit tests in there there's also a lot of equivalence tests to other crates that implement some subset of the functionality. So for example, there's a crate called NavTypes that implements, for example, conversion between WGS84 and ECEF, which is another Earth-based coordinate system. And so there's a property-based test inside of Sguaba that basically generates random points on Earth and then converts them back and forth using Sguaba, converts them back and forth using NavTypes, and then checks that the results are near each other. And then there's also just general property-based testing that does things like, you know, if you pick a random coordinate on Earth, run it through like back and forth through WGS84 and ECEF, which is a lossy conversion, run it through that 10 times and see how much degradation you get. And you get guarantees about, you know, you get probabilistic guarantees about how much deterioration will you see over time. And then we also have a bunch of tests around the enforcement of the type system, right? So both tests to make sure that you can express the correct computations, but also compile fail tests that say you cannot try to use a coordinate from one coordinate system as a coordinate in a different one without an explicit conversion.
Matthias
00:54:11
You put in so much work into Sguaba, similar libraries, and then you decide to open source that work. that's a big gift why do you do that why open source those libraries i.
Jon
00:54:25
Think it's a it's a combination of factors one of them is the the the traditional you know by having more people being able to look at a thing you're more confident that it's correct and i think that applies to to open sourcing software in this context too and and i think there's a related point to that which is, you know, we operate in the defense sector. And so the systems that we build, we want to have as much confidence as we can is correct. But we also want to sort of, to the extent that we can, give people the ability to look at how we build software for whether they think we are building software in a responsible way. And obviously we can't open source like the actual products we're developing, but at least one stepping stone is to open source some of the techniques, some of the tools that we use in order to produce this software to the level of sort of reliability that we want to give. And so open sourcing these kinds of libraries, I think, gives hopefully both some feeling of transparency on that part, but also inspires some amount of confidence that we are building this software, at least at a technical level with care and so I think it matters to demonstrate that and then I think there's a there's a. Sort of a wanting to give back kind of feeling, right? Of we get a lot from the Rust community. And I mean, we are sponsors of the Rust Foundation partially for this reason to give back. But the other way to give back is to make sure that when we build things that we think are useful to other people than us, that we make them useful to other people than us. And then I think that there's obviously a cynical angle too, right, which is you put things out there so that other people get to look at interesting things you've built and then go, I also want to work on those things, right? Do you get to actually show some of your code, show some of your development styles, show some of the problems you're working on and hopefully get other people interested as a result?
Matthias
00:56:30
Yeah. In preparation for this interview, I read through some of the blog posts on the Helsing Tech blog. And I have to say, it's astonishing and certainly enticing to know what sort of problems you're working on. And i think it attracts a certain group of people who are interested in solving hard problems and working with rust because they know that rust is the right choice.
Jon
00:56:58
I i think that's true and if you think about the flip side right imagine we didn't open source anything we didn't write any technical blog posts then i think the the first question would be well why not what do you have to hide right but the the other observation is how do you hire like especially you know talented engineers who are curious about technical depth if the only thing they see is sort of the the product side of things externally like they don't have direct access to the engineers we have internally so you kind of you would have to apply go through interviews and then get to talk to the engineers which is a lot to ask of someone who's still like deciding whether they might want to join the company and so by opening the doors a little bit and showing some of the work we work on and the way that we actually do engineering, You give people more insight and therefore hopefully more to go on when deciding whether this is a place where they would want to work.
Matthias
00:57:54
Do you get any contributions, pull requests, people creating issues?
Jon
00:57:59
We do. It varies between the different projects, right? So the different things we've open sourced are varying degrees of useful to other, like the Dson Crate, for example. I'm not expecting lots of people to make use of because it's a very, you know, you need to have a very particular use case for those to be the most useful to you. And then other things like the Avro tooling, for example, I actually expect could become quite popular because a lot of people use Avro and this thing gives you faster and better error. Like it's faster than the upstream version and it gives you better error messages. So we might get a bunch of people using it, but it's fairly new. Squab, I expect, would probably be quite popular. And that's also the one we've seen the most interest in, actually, of people wanting to find issues, reuse it, contribute to it, and we take those contributions seriously. Buffers has been a bit of a mix where people have had interest, but I think the companies where this becomes the most relevant, many of them have monorepos and therefore don't need this particular tech. As you need to both have a need to do versioning and packaging of extended dependency chains of buffers, of protobuf files, and also not have a monorepo. And that combination, I think it's somewhat rare, although not unique. But in general, we do see interest on the projects we put out there.
Matthias
00:59:23
And I believe one other angle is that a lot of people might just be interested in knowing how a Rust expert structures a library. And there's very little material out there outside of maybe a handful of popular crates and maybe a bunch of blog posts on how to write advanced Rust code. And a lot of people want to learn by osmosis, by reading what other people have written that they deem to be Rust experts.
Jon
00:59:51
Yes, I think that's also true.
Matthias
00:59:53
After all those years, do you still see yourself as an educator and do you do Rust education outside of Helsing or also within Helsing?
Jon
01:00:04
Yeah, I very much see my job as an educator. And it's something that I have, you know, I think a pretty deep passion for. I really enjoy, you know, that moment where you can experience someone else understanding something. That makes me very happy. And so, you know, I continue to do education outside of Helsing. And it's the same thing I did at Amazon as well, where, you know, a lot of my live streams and stuff, that all continued while I worked there. And it's the same thing as Helsing. I do some amount of education internally at Helsing as well. Although internally, it has more of a sort of reactive nature, right? Where people will poke me and be like, hey, Jon, why doesn't this work? Or how should we do this? Or because like we do some amount of office hours internally, but also more of that, you know, we have like a Rust help channel where people ask and then occasionally I'll get, you know, poked explicitly and be like, I think Jon wrote a blog post about this, or I think Jon implemented something along those lines. But I actually think the most amount of education I do that has value internally is actually the external education that I do. So I know that a lot of the engineers we have at Helsing have learned or partially learned Rust through my public educational resources, right? And that is also how some of them continue to learn new concepts in Rust, is to observe the same teaching resources that I put out publicly. And this is also why I think Helsing is quite supportive of me continuing to do my public education, because not only is it sort of a, cynically speaking, like a sales thing, right? Like it's valuable to the company to have someone who's seen as a Rust expert, both publicly be operating as a Rust expert and be employed by them. But I think more meaningfully, it also means that more people are able to learn Rust through the abilities or through the teaching that I do, which means that there's a bigger hiring pool for Helsing to draw from. But also, the people that we hire, hopefully, have then also learned more things about Rust because I've produced those intermediate teaching resources. And the people at the company can continue to improve their skills by me continuing to do teaching. So it ends up being a virtual cycle in a way where it's good for the company, which is good for me, which is good for the company, which is good for me. And I do think there's also some amount of recognition internally at the company that... You know, if I started just building internal teaching resources, it would be seen as a bit of a shame, right? It would be like, why are we not making this material public when it could be? There's nothing secret about it. It's just how do you build good Rust code? How do you engineer high quality Rust products? Then that feels like something we should be sharing back to the community because those aren't, they're not industry secrets, right? They are just things that are beneficial to everyone using Rust. And I think it's also the case that, If we make the Rust community better, we benefit as a result, not just through hiring, but also because the quality of the crates that exist in the open source ecosystem will be better. The tooling will be better. Everything gets better if the community improves. And so there's just a lot of positive externalities here and positive feedback loops that mean that me continuing to do the public part of education is valuable.
Matthias
01:03:32
And with 200 plus hours of video material out there of you teaching rust i sort of think it's inevitable that people share videos of you internally without you knowing.
Jon
01:03:46
Oh that that definitely happens i mean this happened at amazon too where the moment i got on the inside i kept finding places where people had referred to either things that like videos i'd made or blog posts i'd written or crates i've published being like Jon go look at this thing or like you should read Jon's thing about this and that is always fun it's the same thing when when we do hiring a decent number of the people that go through the interview process say that either they learned rust through me or they heard about the company through me and that is a weird feeling for sure do.
Matthias
01:04:21
You see people taking it to the extreme sometimes where maybe they they learn about an advanced concept and they want to apply it at work and during code review you find well, it's expressive it's certainly concise but maybe not maintainable by a larger team and where do you draw the line.
Jon
01:04:43
Yeah, I actually think this is not just a like intermediate Rust programmer thing. I think this is pretty common across Rust, even from the early days, is that people see all of these tools and techniques that are possible in Rust, and then they immediately want to make use of them. And I think the new type pattern is a big one, right? Like I can define custom types for everything, and then everything is type safe. And people start using that pattern to the extreme. And to the discussion we had earlier around the trade-off space here, they just navigate the trade-off space by always picking the most type-safe thing. And we see the same when you look at the hesitance to use locks, the hesitance to use RC and ARC. So people try to use lock-free algorithms and use references with lifetimes everywhere. And you end up with multiple lifetime annotations. And no one wants to clone anything. and everything has to be monomorphized so there's no dynamic dispatch. And people really lean into every possible feature that Rust gives you. And it makes it really painful to program in the language. It makes it painful to review the code. It means that you end up constructing suboptimal software architecture because you can't express the architecture you want with the borrow checker, with the type checker, or even just with your current knowledge of the language. And so I do tend to see. Especially in people who haven't built production code in Rust very much, they tend to lean overly much into some of these patterns. And then you kind of have to pull them back and be like... It's okay to clone here. It's okay to put this thing behind a mutex. It's okay to not have a new type for this particular string representation of an email, right? And over time, people learn that distinction. But that is part of the education you kind of have to do on the job is see the code that people write and then course correct as you do for where they're maybe overzealous about the use of some of Rust's features.
Matthias
01:06:45
Yeah, it's certainly a bit of a rite of passage.
Jon
01:06:49
Yes, I think so.
Matthias
01:06:50
Do you think the key to idiomatic Rust is keeping it simple and then maybe making it right where it matters? So finding that balance between on one side simplicity and maybe ease of maintenance versus correctness for things that really are important? Or what's your working definition of idiomatic, Rust?
Jon
01:07:16
I think it's very hard to give a sort of general definition. I tend to start from, I don't think you should start with the simplest possible thing, but I also don't think you should start with the most complicated thing. And this is where, and I don't like using this, but I think experience matters here, right? Where like over time, you just get a feel for where the balance should lie. You start writing the code and you go, it's okay to clone here, and it's not okay to clone here. And it's hard for me to distill what the principles are for making those choices. I would say in general, it is very useful to have a running system. Once you have a running system, you can then, like, refactoring with Rust tends to be a lot easier than in other languages because you have the type system and the solid compiler and type checker and borrow checker to rely on. And so I would tend to err on the side of where the type safe thing is easy, then do the type safe thing. And where the type safe thing gets in the way of you building the actual application to the end, build the application to the end first, and then mark it with a to do to come back to. And some of those to do would be very painful to fix later. But the reality is, if you don't finish the whole thing, you're never going to come back to the to do's because you didn't work in the first. Place so so it's useful to like use that that as a forcing function for making you make a suboptimal choice is like well i need to at least get to a thing that runs otherwise this whole thing is irrelevant.
Matthias
01:08:51
Rust is a huge language, and while I'm pretty sure that you know more than most people about Rust, what is one thing that you would personally want to spend more time on? If you had three months to focus on one subject that was Rust-related, what would it be?
Jon
01:09:08
I think there are two categories for me. One of them is around WebAssembly. So I've done very little WebAssembly in Rust, and I think it's both a cool technology, And it's something that I think there's a bunch of use cases for it that I think we haven't fully explored. And I would love to fiddle around with it to see what I can make of it. But also, I think it's a very useful skill or set of knowledge to have in your toolbox. And same thing for, you know, I'm thinking of writing another sort of version or not version, but a second iteration of Rust for Rustaceans. And, you know, Rust Rustaceans doesn't have a chapter on WebAssembly, in part because I hadn't done very much WebAssembly at the time when I wrote the book and I didn't feel like I could be an authority on that subject. And I still don't think I could be. And so that is something I would want to do more of. I think the other category would be sort of deep embedded development. I've done some embedded development in Rust and I've certainly written, you know, crates for no-std and everything. But to really write something low level on a microcontroller where, like, you need to, like, You need to initialize the CPUs in the right way, and you need to handle the interrupts, and you need to write some inline assembly. Code like that is really fun to write, and I haven't written lots of it in Rust, but I would like to, because I want to see, you know, what does it feel like when I push the language in that direction a little bit? And like, where are the sharp edges, and what can I do to make those sharp edges be more ergonomic, right? Like, this could be building tooling, building libraries to make that experience better.
Matthias
01:10:44
I wanted to briefly touch on supply chain security because I believe it's kind of important for housing. You write a lot of code, you maintain a lot of code yourself, but you still need to depend on a lot of crates that are out there that we sort of take for granted. And there's been some recent challenges around some packages in Rust. I don't want to mention any names. And Cargo itself also had some sort of exploit because of the tar crate just a couple days ago. I wonder what's Helsing's stance on that? And what's the state of the Rust ecosystem in regard to supply chain security?
Jon
01:11:32
I think Rust is not in a worse place than other ecosystems here. I think it's a bit similar to other ecosystems where when you take a lot of third-party dependencies, there's some amount of inherent risk there. And I don't think Rust's tooling is worse or Rust's risk is higher. And I think the question, as with any language like this or any project that has to take third-party dependencies, the question becomes, what do you do about those risks? And you think, you know, the reality is you have to build in defense in depth against these things, right? There's not going to be one silver bullet that just solves all your supply chain security problems. Instead, you have to have sort of a collection of processes and tools that make sort of as many parts of it as secure as you can. And then you layer them on top of each other to get coverage across your whole pipeline. So this includes everything from, you know, being judicious about your selection of dependencies in the first place. Like, don't take a dependency on some tiny, barely maintained project if you can easily just replicate the functionality yourself. Like the reason why it's worthwhile to take dependencies is if the maintenance cost of the code in that dependency is large but if the maintenance cost is actually pretty small it might not be worth taking the dependency and introducing that risk the same thing with if you have the choice between multiple dependencies then look at the different dependencies not just in terms of the the quality of them in terms of like the you know the api the documentation the current code maturity, but also look at the maintenance of that package. Who maintains it? How many people? Do they have CI? What kind of testing strategy do they have? Do they have a security disclosure policy? There's a bunch of things you can look for here that indicate something about the. The sustainability of that package and of taking a dependency on it. And then there's also the sort of ongoing monitoring part, right? So obviously you want to monitor all of the security vulnerability databases to make sure that if you run into a problem, or rather if a problem is discovered with some version of some dependency, you A, are notified, and B, that you internally have the infrastructure to find all the places where the impacted dependency are used. And so this includes being able to track provenance information for all the builds that you do, provenance for all your deployments, and this is where you get into things like generating SBOMs, like software bill of materials that list all of the dependencies that went into a given artifact, tracking which software releases are released to what customers, at what time, in what products, in what physical devices, and keeping track of that whole graph structure. And being able to do analysis over that graph over time as you learn about new vulnerabilities. And then, of course, there's also work here on security scanning. So this would mean both doing scanning of our code for insecure patterns and the like, but also running proactive scanning on dependencies that we take. The first time they're brought into the company, anytime there's a new version and so on, to actually scan them for. Is this a dependency that we want to take? And some of that could be human review. Some of that can be AI-assisted review. And there's a combination of these that also could work. We wrote a blog post recently about using AI for assisted vetting of software packages. I'll send you the link and you can put it in the show notes. And so that contains some more thoughts about how you can not necessarily replace the human review here, but at least make it more efficient for humans to review those dependencies that you take. And so there's like a whole host of techniques where you kind of need to do all of them because they end up giving you, so each one gives you sort of partial coverage of the stack that you have. And only when you combine all of them do you get the defenses that you need. But even then, you know, taking dependency ultimately is a risk. And so you have to take the calculator risk of is the upside of taking this dependency worth the potential risk that you're introducing. But I do think that there's a genuine security case for taking dependencies because the alternative, if you build everything in-house, is that you will not have the people, especially sort of subject matter experts, to maintain those internal implementations over time. So if we internally implement it, I don't know. I mean, crypto is the obvious example of like the old adage of you should not roll your own crypto. If we personally like implemented all of our own crypto libraries, I'd be deeply uncomfortable with that because we don't have enough, you know, cryptography specialists and analysts and engineers to A, build it in the first place and then B, maintain it over time. So I would much rather that be a publicly vetted, widely used. Continuously handled by a large number of security experts, and then we take a dependency on it, is a much better and more secure decision for your dependency chain than in-housing everything. And the question really becomes that risk-reward trade-off of at what point does it become better to just in-house that dependency so that you don't take an external dependency on it because the upkeep is not that bad or the upkeep is not likely to have security implications compared to the third-party dependency.
Matthias
01:17:13
There's a common trope that people use, which is Rust's package ecosystem is similar to NPMs. We have a lot of smaller packages, and that exposes us to bigger risk. What's your take on that?
Jon
01:17:28
Yes and no. It is true that Rust tends to have more dependencies than Java or C++, for example. It tends to have more but smaller dependencies. I think the jury's still out on whether that's a good thing or a bad thing, because the downside with taking a large dependency is that A, the large dependency means the maintainer of that project is maintaining way more code. And chances are they're not an expert in the entirety of that code. And so the likelihood that any part of it is under maintained or under vetted or underdeveloped is much higher. And then the other is the breaking changes sort of update cadence tends to be worse because if you have one giant dependency that you have to, like now they make a breaking change, that might be a lot harder for you to adopt because you have to adopt all or nothing of the entire dependency. Whereas if you have many smaller dependencies, fewer of them make breaking changes at any given point in time. So more of them will be fully up to date because they don't need to align on like a single breaking change schedule. But ultimately, you know, there are some things where taking a large dependency is probably worthwhile. And we do see this in the Rust ecosystem too. If you look at things like Bevy, for example, right? Bevy is effectively one big dependency. Tauri is another one. And so Rust doesn't preclude you from doing this. It's more that I think it often makes sense to have smaller dependencies of this kind. And I don't think that's inherently a, it's not obvious to me that it's a guaranteed security risk compared to the alternative. You also have the downside with large dependencies that you could end up with because it's so large, you need to have many maintainers. And so you have a large number of maintainers that, Would it be better if that same number of maintainers each maintained a smaller library that was a subset of the overall thing? I think that might be better. I'm not sure. But yeah, it's not clear to me that Rust is in more of a danger because it has this coarse or this finer granularity of packages.
Matthias
01:19:42
I really like your take on Rust's great ecosystem and also contrasting it with whatever Node and NPM provide. What about unsafe code, though? Because this is very unique to Rust and to how we think about safety and code. If you look at this from a perspective of supply chain security, aren't we exposing ourselves to a lot of risk by taking on a lot of unsafe code? And also how would you vet for that.
Jon
01:20:16
Well it's complicated because unsafe code is not inherently less safe even though the name kind of implies that right because if you look at java if you look at go if you look at node.js and and certainly if you look at c and c plus plus. Those languages have no guardrail for what is safe and what is unsafe. You know, if you look at Java, you have unsafe operations in Java as well, where you do direct pointer manipulation and you can do really bad things there and you can violate memory safety and you can do all these things. In C++, like all the guardrails are just off. Even if you turn on a lot of the compiler validation, like you can just do these things. It's just that in Rust, it's more obvious when you do these things than it is in those languages. And so I think in Rust, the reality is that safe and unsafe is more of a communication mechanism to say, this part of the code, you should look at more carefully because it needs to sort of uphold things that are not checked by the compiler. And so it's places where you get fewer of the benefits of safety from Rust, but they're not places that are inherently less safe than general third-party software. I do think actually that, you know, you should think a little bit more about taking a dependency that includes unsafe than one that does not. But I don't think it should be a thing that, you know, excludes you from taking that dependency or see it as significantly more risky. Where I worry more is when you have crates that have no business having unsafe code, but they do anyway. Usually this is for like performance optimizations reasons or just they want to work around the borrow checker. That kind of use I'm more skeptical of, but it's not really a binary of is it unsafe or is it not?
Matthias
01:22:06
Do you think unsafe, the term, was a misnomer?
Jon
01:22:10
Yeah, I think in a way it is, because there are two uses of unsafe in Rust. One of them is on function definitions, and the other is on blocks inside of code. On blocks inside of code, it really should be described as safe. The goal of that annotation is to claim that the code between this curly bracket and this curly bracket is not subject to the standard compiler checks or it's allowed to break some of the rules or it's not checked that it doesn't break the rules. But trust me, I have checked that it is safe. That's what you're asserting by putting unsafe around a block. And so it's not to say this code is unsafe. It's actually to say this code is safe. It's just that it's checked by me, not by the compiler. And when it's put on a function definition, unsafe is more of an appropriate term because it's saying, this function is unsafe to call unless the following is true yeah but so i i do i do wish that there was a different name for for the the unsafe in a block.
Matthias
01:23:12
I like that you said assert in that context because yeah it feels like an assertion maybe it could be called assert safe.
Jon
01:23:21
Yeah that is the the challenge right is that asserts we think of as something that can fail and the assertion here can't fail like the the program can't fail to run as a result of that assert it doesn't actually check anything it's like a signature that i just trust me i've checked.
Matthias
01:23:42
What's next for Helsing and for you.
Jon
01:23:46
For Helsing, I think it's continued development of the main product lines that we currently have, all of which are fairly ambitious and have some really cool tech in them. And so this includes the CA-1 is the project I work on, which is basically an effort to try to build a self-flying jet fighter in two years. So it's a very ambitious timeline. We're building the whole thing, hardware and software, from scratch. It's no joke, but it is really interesting work. And then we have the SG-1, which is an underwater, also autonomous vehicle that aims to do business to look for things like submarine traffic in large underwater areas. Think like the Baltic Sea, for example, where you need these things that can stay underwater for very long periods of time, have very minimal compute, very limited ability to, you know, very constrained budgeting for power. They can't have any moving parts in them because then they're too easy to detect and then you still need to write software on there that does sonar analysis for example so they need to run ML models how do you do that when you have extremely constrained battery power. And so that's some interesting stuff and then obviously we do a lot of work with the HX-2 for use in Ukraine for instance where that is a lot of work on being close to the sharp end of defense but also systems where safety criticality is enormously important. And it might sound self-contradictory to talk about safety critical in a system that explodes, but the reality is that it is extremely important that it explodes in the right place for the right reasons at the right time. And that's where the safety criticality comes in. And the cost of getting that wrong can be catastrophic. as a working on these systems, I think is, you know, it's, I don't like to describe it as fun because it feels incorrect, but it is challenging. It's interesting. It is... I think it's meaningful. I think it's important. And I think, you know, in terms of the future of Helsing, a lot of that becomes continuing to work on these products and other ones to see how far we can go with, you know, using software to build really good deterrence capabilities for Europe.
Matthias
01:26:03
And for you, which streams can we expect in the near future?
Jon
01:26:08
I actually tend to not plan my streams very far ahead. Instead, what I do is I look for things that I want to exist, either at work or in my personal life, and then I build those and then I turn on the camera while building them. Because if I try to do streams that are where the content is... Hand-picked for streaming. It sometimes works if I pick a topic that I know is particularly deep and gnarly in there, but it's much more compelling when I can say what my use case is and what I'm building towards. It means that I can build a solution that actually solves a problem and that therefore comes with additional constraints on the implementation, a less generic goal, because I think that's where a lot of really good software engineering practices come in are when you're not just writing code, but you're writing code for a purpose. And so that's why my streams tend to be when I have a thing to build, then I do a stream on that topic, which was the case recently, for example, for the Avro IDL converter was I needed one of those, so I built it. And I think increasingly we'll see other examples of this of I need a thing either for work or for my personal life, and I will build them. That's how I've done my streams until now and how they will continue.
Matthias
01:27:27
And so far, your intuition hasn't failed you around that.
Jon
01:27:31
That's true. I have some ideas for other forms of streams or other forms of educational material that I have sort of half-baked in the back of my head. I think some of them could be really good, that the challenge is always, when will I find the time? And so this could be things like, I've had an idea for a sort of Rust intermediate course that would actually be a structured course rather than just a sort of ad hoc streams. I've had ideas for more of a, casual chat beginner's introduction to Rust that comes in shorter snippets rather than like 10-hour streams. I think that could be really cool, but finding the time is always the hard part.
Matthias
01:28:10
There's a ton more that we could discuss, but we have to get to the end. And traditionally, our final question is, what's your message to the Rust community?
Jon
01:28:19
I think my message to the Rust community is twofold. I think the first half is Rust is now a language that is actively in use in important systems across the world. And that's a good thing, right? This is what a programming language aims for, is to be successful to the point where it's adopted for real use cases that makes a real difference. But the result of this is that companies care about what happens to the language and the direction the language goes in. And I think this is sort of, a little complicated for the Rust community, which traditionally has had very strong opinions on not just how the language is used, but also sort of there's been an association, I think, with like a value judgment on what the language is used for. And increasingly, I think the Rust community will have to come to terms with the fact that it's being used in production in ways that we don't control. And I think we need to decide how to interface with that. A good example of this is like sponsorship of Rust conferences, right? Or in general, like sponsorship of the Rust Foundation, for example, like putting money into the Rust ecosystem, where this to me is like a good thing for the Rust community. I want that to happen. I want the community to get the language of the funding that it needs to continue to grow and continue to develop. But that comes at the cost that the language needs to prove that that influx of money is worthwhile. Because in general, if you want the funds to continue to come, there needs to be something that you get back for the money you put in. And that's not to claim that I have the answer for this, but I think there's been a sort of allergic reaction in the Rust community to industry involvement in the language. And I understand why in many ways, but I do think that it is an allergic reaction. We have to figure out how to mitigate because otherwise the funding for the language dries up and we end up with a language that doesn't have the funding to keep growing. And then I think that the second half of my takeaway from the Rust community is we've built something that's really good. Like the Rust language is really good. The tooling is really good. The community is really good. And I think that the ecosystem as well is really good. But I also think I've observed a sort of stagnation in the, I don't know how to describe it, like the creativity of the use of the language compared to some of the early days. And that might be because we've solved many of the problems, right? But it does feel like there are still cool things you could do with the language that can change how the ecosystem works. serde was a good example of this from back in the day where it was a different way of doing things. And then it made us have things like derived procedural macros and the ability to do serialization in a really efficient and cross-language ecosystem way. And I want to see more of those kinds of ambitious, let's build something that is different, is better, is cool, new, innovative in the language, in the ecosystem. I'm seeing less of a hunger for that than I think I saw in the early days. And that makes me a little sad.
Matthias
01:31:42
Well here's hoping that people like you who are ambassadors of the language will bring some of that passion back and i certainly hope to see you on the stream i will follow along Jon thanks so much for taking the time and for being part of this community no.
Jon
01:31:59
Thanks for having me it was a fun conversation.
Matthias
01:32:00
Rust in production is a podcast by corrode it is hosted by me Matthias Endler and produced by Simon Brüggen. For show notes, transcripts, and to learn more about how we can help your company make the most of Rust, visit corrode.dev. Thanks for listening to Rust in Production.