Helsing with Jon Gjengset
About keeping critical infrastructure secure and resilient with Rust
2026-04-23 93 min
Description & Show Notes
Jon Gjengset is one of the most recognizable names in the Rust community, the author of Rust for Rustaceans, a prolific live-streamer, and a long-time contributor to the Rust ecosystem. Today he works as a Principal Engineer at Helsing, a European defense company that has made Rust a foundational part of its engineering stack. Helsing builds safety-critical software for real-world defense applications, where correctness, performance, and reliability are non-negotiable. In this episode, Jon talks about what it means to build mission-critical systems in Rust, why Helsing bet on Rust from the start, and what lessons from his years of Rust education have shaped the way he writes and thinks about production code.
About Helsing
Founded in 2021, Helsing is a European defence company building AI-enabled software for some of the most demanding environments imaginable. Helsing's software runs where correctness is non-negotiable. That philosophy led them to Rust early on and they've leaned into it fully. From coordinate transforms to CRDT document stores to Protobuf package management, almost everything they build ends up being written in Rust.
About Jon Gjengset
Jon holds a PhD from MIT's PDOS group, where he built Noria, a high-performance streaming dataflow database, and later co-founded ReadySet to continue that work commercially. He then spent time building infrastructure at AWS, before joining Helsing as a Principal Engineer. Outside of his day job, he's been teaching Rust to the world through his livestreams and writing for years, which makes him a rare combination: someone who thinks deeply about both how to use Rust and how to explain it.
Links From The Episode
- Helsing AI selected for Eurofighter upgrade - Helsing's Eurofighter Project
- CA-1 Europa - Helsing's Autonomous Uncrewed Combat Aerial Vehicle
- Rust in Python cryptography - Rust being used in a Python library
- Clippy Documentation: Adding Lints - How to add custom lints to (your own fork of) clippy
- anyhow's .context() - Use it everywhere, it's very very helpful
- eyre - A fork of anyhow with support for customizable, pluggable error report handlers
- miette - Fancy, diagnostic-rich error reporting for Rust with source snippets and labels
- buffrs - Helsing's Cargo-inspired package manager for Protocol Buffers, written in Rust
- sguaba - Helsing's Rust crate for type-safe coordinate system math, preventing unit and frame mix-ups at compile time
- Sguaba: Type-safe spatial math in Rust - Jon's talk at Rust Amsterdam introducing sguaba and the type-system techniques behind it
- Apache Avro - A compact binary serialization format for streaming data, with a Rust implementation available via the apache-avro crate
- pubgrub - A Rust implementation of the PubGrub version-solving algorithm, as used in Cargo and uv
- CRDTs - Conflict-free Replicated Data Types: data structures that can be merged across distributed nodes without conflicts
- ADR (Architecture Decision Record) - A lightweight way to document important architectural decisions and their context
- DSON: JSON CRDT using delta-mutations for document stores - The 2022 paper that was the basis for Helsing's CRDT implementation
- dson - Helsing's Rust implementation of DSON
- Jon's Livestreams on YouTube - Deep-dive Rust coding sessions where Jon implements real-world libraries and systems from scratch
- WebAssembly with Rust - The official Rust and WebAssembly book, covering a cool technology and useful skills to have as a Rust developer
- Rust for Rustaceans - Jon's book for intermediate Rust developers covering ownership, traits, async, and the finer points of the language
- CVE-2024-24576: Cargo/tar supply chain vulnerability - A security issue in the tar crate that affected Cargo's package extraction
- Wikipedia: Defence in Depth - The security principle of using multiple independent layers of protection; Even with Rust you need multiple layers, there is no silver bullet
- SBOMs (Software Bill of Materials) - A machine-readable inventory of all components in a software artifact; Cargo's lock files make this tractable for Rust projects
- Helsing: AI-assisted vetting of software packages - Make it more efficient to review dependencies you take in
- Bevy - A game engine built entirely in Rust, and a notable example of a large, complex Rust dependency
- Tauri - A Rust-powered framework for building lightweight desktop and mobile apps from a web frontend, an alternative to Electron
Official Links
Transcript
It's Rust in Production, a podcast about companies who use Rust to shape the
future of infrastructure.
My name is Matthias Endler from corrode and today I talk to Jon Gjengset from
Helsing about keeping critical infrastructure secure and resilient with Rust.
Unless you've been living under a rock, you know the guest of the show,
but I will let him introduce himself.
Jon, happy to have you.
Thanks, Matthias. So I'm Jon Gjengset. You may know me as @jonhoo online,
otherwise on the various channels, although YouTube I think is the primary one.
And I guess, who am I?
So professionally, I work at a company called Helsing, which I guess we'll talk
more about today. I work as a principal engineer, which means I end up jumping
all over the stack, owning wherever the biggest areas of attention are and where
I can be the highest leverage.
So I'm not necessarily in one particular team persistently over time,
but rather seeking out where I'm needed.
I've worked there for the past three years. Before that, I worked at Amazon.
I maintained and built their Rust build infrastructure.
And then before that, I did a PhD at MIT where I built a sort of novel distributed
systems database project that has since turned into a startup called ReadySet.
That's sort of the professional side of my career. And then the thing I'm more
widely known for in the Rust community is the Rust educational materials that
I make, primarily in the forms of videos on YouTube,
where I do long form content on building things from scratch in Rust and showing
people the actual code as we develop it and trying to give you know a realistic
view of the the the actual development workflow like what does it look like
when you go from zero to one on a code base in rust.
Yes and that's exactly the reason why i
was super excited to have you on the show to do
this interview and i guess i can speak for everyone
listening right now when i say thanks so much for all the content for educating
all of us you're probably one of the most famous rustaceans out there and
overall from what i can tell an amazing human being always happy to share
I
appreciate that thank you!
And a lot of people know you from this educational context, but maybe not everyone
knows that you're a principal engineer at Helsing.
So talk a little bit about your role.
You mentioned it already, but maybe you can get into the details.
What do you do there and what is Helsing's responsibility right now?
Sure. So Helsing is a defense company based in Europe.
They started fairly recently. So it's only, I want to say, four or five years old.
I forget the exact start date, but around there. I joined in...
Sort of towards the end of, or middle to end of 2023, after I moved back to Europe.
And Helsing operates in sort of across the entire defense spectrum in Europe,
but they're specifically focused on software-enabled defense.
So how can we use software as the primary driver for, you know,
keeping up the deterrence capabilities of especially democracies and especially
focused on on Europe, given that's where the company is based as well.
It's already become quite a large company. So in the sense that,
you know, we're now somewhere in the vicinity of a thousand employees we operate
in, or we have offices in, I think, six different countries now.
So we have offices in Estonia and Poland, in the UK, Germany, France.
Am I missing any? I think that those are the sort of primary office locations we have.
And then obviously, you know, I work remotely from Norway for the company and
we have other people working elsewhere as well. But those are the sort of primary locations.
But for a company that's only four to five years old, that's quite the growth.
And the, you know, the reality is that that is partially because of the world
situation and the way things are developing in Europe, where a pretty,
you know, a pretty severe investment into European defense has been needed.
And Helsing has sort of seen fit to try to fill at least some of that space.
And so we operate across land, air, maritime.
We recently announced some work in space. And so the observation is that by
focusing on the software,
there's a lot we can bring to the table and we can bring it to the table pretty
rapidly because software in
general has more rapid development cycles than traditional hardware does.
And so, you know, I, in my work at Helsing, I've been primarily working in the air domain.
So working with, for example, the capability upgrades for the Eurofighter,
which is a European jet fighter program.
And currently I'm working more on the CA-1, which is our recently introduced
product that is essentially an autonomous UAV.
And so I'm working on building a lot of the software that's going to underpin that entire stack.
If you compare Helsing's usage of Rust with AWS, can you see any differences?
Yeah, I mean, I think there are a number of differences, actually.
One of them is that Helsing was...
A sort of rust first company so the it
was very early on decided that the the entire stack
there should be rust based and and
should be you know wherever possible rust would be the language of choice and
we get we can get into exactly why that was but but that has sort of shaped
a lot of the software engineering it shaped a lot of the software that we've
built and the way that we built software whereas at amazon you know amazon was
originally a sort of java company and,
some healthy amounts of Perl in there.
And then over time, it's sort of grown into this polyglot company where there
are lots of different languages in use in different parts of the company.
And Rust is obviously one that's, I think it's beyond up-and-coming now.
It's actually being adopted in pretty serious use cases across AWS in particular.
But Rust was the sort of up-and-coming competitor.
It was not the incumbent. and that changes
how you adopt it right because it means that at amazon there's
a lot of infrastructure there's a lot of tooling that's not built for rust that
was built with other languages in mind and where rust now has to make inroads
into that ecosystem and integrate with all the things that people are used to
working with whereas the tell thing we can build everything specifically for
using rust because that is the the primary target language yeah.
What i get from this is aws had infrastructure before and it was sort of a brownfield
adoption of rust whereas at Helsing it more or less might have been a greenfield
adoption i'm not sure if that is true.
Yeah i think that's right i mean it was a
is a pretty principled choice from the beginning of the company to say you know
we're going to be building technology where you know a lot of the technology
is critical right it's like it it it ends up having implications for life and
death decisions it ends up having implications for you know,
the correctness in in the military
domain like messing things up here is
very very costly and i don't mean in terms of monetary
cost right it's just costly in like a human cost and so as a result you need
to make sure that you build systems that are highly resilient highly reliable
highly predictable and and robust and rust is one of the mechanisms that we
wanted to use from very early on to make that be the case.
And, you know, because the company is relatively young,
we could also then say we're going to take the attitude here of everything is
going to be Rust from the get-go rather than say, you know, let's just try out
different languages and over time figure out what to choose.
We're just going to say that is what the stack is going to be built in.
So it's very much a greenfield approach in that way.
And yet, even if you base everything on Rust, you still need to interface with
existing libraries that might not be written in Rust.
You might drive controllers or other hardware that maybe has firmware that isn't
written in Rust. How is that story like?
Yeah, and I actually think that's one of the reasons why Rust was able to gain
such adoption in such a wide set of areas is because its interoperability story is decently good.
There are things I would still like to see Rust grow on here for sure,
but it gives you low enough control to be able to write firmware,
operating systems, embedded devices in it.
But it also makes it relatively easy to plug Rust into existing code bases,
whether that is underneath something like Python.
Python cryptography is a good example here where they are now using Rust under
the hood for some of the native components,
but also being able to put Rust above other things where you have an existing
C code base or even C++ code base, and you want to be able to interface with it from Rust.
Well, you can also do that. So you can sandwich Rust into wherever in the stack
you feel like it's appropriate, and then it can grow out from there.
And that's much harder with some other languages. Like if you have a runtime,
for example, and the being able to do foreign function interfaces either above
or below you tends to be more painful.
Did you see cases where vendors provide Rust SDKs now that it's becoming more popular?
It's a mix. There are some domains where we're seeing more interest in supporting
Rust from industry vendors.
And then there's others where it's still, you know, all like all C++ or you'll
see places like the, if you look at the Terraform Kubernetes ecosystem,
a lot of that is Go and they're not really planning or they're not.
They don't seem particularly interested in saying we're going to also provide Rust bindings now.
And then you have, you know, more embedded ecosystems where maybe they already
have an existing sort of stack for development that's C-based maybe,
and they don't necessarily want to change that.
But I do think we're seeing vendors now think more about maybe a Rust SDK is
also something we want to provide.
I mean, Amazon is a good example where they, you know, now have a Rust SDK.
And it was because there was sufficient demand for, you know,
we want to build things that use AWS from Rust. So please let us do that.
Maybe for some additional context, could you list the things where Rust is front
and center at Helsing? What makes the stack?
So Rust is used throughout our entire stack, essentially.
So we use it for obviously anything that's backend services and the like,
but we also use it for anything that's close to edge devices.
So if you are writing code that's going to run on a UAV, on a drone,
on an underwater autonomous submersible, whatever it might be,
realistically, you have to use a language there where you have fairly tight
control over things like power usage, performance, predictable performance over
time, low overhead compute, and hardware control.
And so we use it a lot there in sort of embedded or bare metal systems,
but also things that are almost embedded or bare metal, right?
So things like, you know, essentially single application, but still runs on
Linux, but on a very particular compute board would still be something where
we write those applications in Rust.
We use it for a lot of our tooling is built in Rust a lot of our networking
technologies are built in Rust in fact it might be easier to sort of list the
things that are not done in Rust which I'd say,
there's primarily three categories so there's.
Web frontends, we tend to build in TypeScript instead, because you can do some
of it in WebAssembly, but realistically, TypeScript is where you're going to
get the most mileage here.
And then for AI research, like for the people actually working on the underlying
machine learning algorithms and training and the like, there we try to enable
people to use Python because that is where they're the most productive.
That's where a lot of the state-of-the-art research happens.
And that's where a lot of the existing tooling and libraries and such exist.
And so we don't really want to move all of that to Rust, even though we think
it's probably feasible.
It's not clear that it gives the bang for the buck in that area.
And then the last is we have some things around infrastructure that's in Go
I mean I mentioned Terraform and Kubernetes and stuff so there's like some stuff like that where,
Go has the best support ecosystem for writing things in that environment but
I'd say that's a that's a pretty small minority I'd say sort of in in order
of in order of adoption percentage I think it's Rust by far and then Python
and then TypeScript and then there's like a tiny bit of Go there and then there's like
There's always Bash because CI and everything, but I'd say those are the primary languages.
According to our internal statistics, a large percentage of our listeners do
use Rust in production in some way.
But what does it feel like to work in a codebase where Rust is truly the answer
everywhere, not just in one layer?
It's very convenient, right? Because it means that whenever I go to any codebase
across the company, chances are I know how to read that codebase.
Chances are it's written in Rust.
It's not just about being able to read the code. It's also the structure is
kind of predictable because they're all cargo projects.
They all have, you know, the layout you would expect from that.
It also means we can write tooling that is specifically designed for Rust projects
and then, you know, add support for Python, for TypeScript.
But we can build things specifically for, as an example, we have an internal
linter that we use to catch things that are more company preferences rather.
So we run Clippy, and then we also run this other thing that tries to lint for
things that are preferences we have for software engineering in Rust.
And so this can be things like looking for preferred libraries we have for things
like logging, for example.
Or it can be things like how we believe you should take internal dependencies,
like how they should be expressed in your Cargo.toml.
Or it can be things like, you know, how we think you should be using expect,
like the sentence structure we believe you should have inside of expect statements.
Like there's a bunch of that sort of stuff that you can do linting that I don't
think necessarily makes sense for the, like, it wouldn't make sense for upstreaming
into Clippy, but it does make sense for enforcing things internally.
And the fewer languages you have, the more you can build tooling specifically
for those languages to encourage the sort of software engineering excellence
practices you want to have for that language.
Yeah, I'm sure a lot of people will be interested in taking a peek at that.
So please open source it if you can, even if you don't want to upstream it.
I think it's mostly uninteresting in the sense that Clippy already has a relatively
straightforward mechanism where you can basically implement your own lints.
And so our internal tool is just that.
It's just a Clippy, but where all the lints are replaced with things we decided we wanted to look for.
And where there are things that we think are actually useful to Rust users more
broadly, then we would obviously open source those or upstream those into Clippy itself.
But for a lot of the others that are just like encoding of our software quality
standards, it's like, I don't actually think they're all that interesting outside
of the company. And there's certainly nothing about the linting engine itself
that's interesting, right?
Because it is just the clippy tooling that exists that we've added our own rules to, if you will.
Which sort of begs the question, do you use that tool across the stack?
Or do you make differences between embedded developers and backend developers?
Do they use Rust differently?
No, that tool is used across the stack. It's not currently mandated.
So it's a thing that you can choose to add to your CI. And then we are strongly
encouraging everyone to use it in their CI.
But most of the rules here are around things that are good practice,
no matter where in the stack you operate.
This can be things like if you have an error type that is, let's say,
error result, and you are propagating that error type using the question mark
operator, you should have a call to dot context before it.
Right right and that makes sense for for any code base you're operating in the
same thing like we can include encode rules like always prefer dot context over
dot wrap error or vice versa right but we can at least encode that we believe
it should always be one or always be the other,
and so it's like those kinds of things more so
than you know it's harder to write a lint for
i don't know how you should handle back pressure in an application which is
the kind of thing that would differ between embedded development and cloud development
you can't really write a lint for that at the clippy level it's more it's almost
more architectural and so those things are not things we lint for do.
You have an example for when dot context is helpful.
Oh i mean i love dot context i use it everywhere
the the thing that context gives you is the
ability to you know when you propagate an error up
through the stack that error might just be
something like permission denied or not found or something
and if you just propagate it with question mark then at
the point where you emit the error you know in your main in
your binary or something you would just get an
error printed that says file not found or permission denied and that's completely
useless and if by adding context you can include not just information about
like which file was being accessed but why was that file being accessed so imagine
things like you have a imagine you have config files that have include directives.
And so you might be like three levels down in an include hierarchy,
and then you reach a file you're not allowed to read.
Well, then the question is, which file included the one that said that you should read this file?
So even if you had the file name without the context, the file name might be
like, well, I don't know why foo.bar is being included.
That's not any of the files that are listed in my top level config.
And so the context allows you to give.
Both things like data context, but also programmatic context,
like why are we loading a config file here in the first place?
Or in which part of the application did we load this configuration?
Like was it, you know, I think that's loaded at startup, or maybe it's like
the code that handles configuration changes at runtime.
They both end up reading the config files, but which of them failed with this error?
And so in general, adding context like this is very, very helpful for making
your errors actually be actionable where they are where they're emitted yeah.
And it also works across different crates even in the same namespace or outside.
Um well i mean context is just
the the way this is set up is if you use anyhow or you use eyre or you use miette
and you have this sort of opaque error type dot context allows you to a take
such an opaque error type and turn it into another opaque error that has the
original one plus this additional context as a part of the chain.
But it also lets you take an error that is not one of these opaque errors,
like one that comes out of some external library, and then chain this context
on top of that error and turn it into one of the opaque errors.
It obviously doesn't work so well if you have enumerated errors.
So if you have ones that are like, I have an error enum, and these are the variants,
then you can't easily call dot context on that unless you're willing to erase
the the concrete type of that error and turn it into an opaque error like an anyhow error.
How
high is the code reuse at Helsing
I think it depends on where in the stack you look, right?
So we have projects that span from relatively new sort of experimental,
we're trying something out,
to this is in a product and has been in development for several years.
And the amount of code reuse you do in one versus the other is pretty significant.
It's also the amount of code reuse you do within the domain versus across domains varies.
So there are more commonalities between something like a UAV and a strike drone.
Those have a lot more similar components than, let's say, a radar analysis system and a submersible.
Those don't share as much in common. So it's hard to give you one number for the amount of reuse.
But we do try to lean pretty heavily into if you've built something that is
useful elsewhere at the company or indeed outside of the company,
then build it as a reusable library.
And so, you know, you've already seen some of this come out on the public side of Helsing.
So we have some open source repositories like we have a tool called buffrs,
which is basically a package manager for protobuf files.
So it lets you take a collection of protobuf files,
create basically a proto.toml, which is equivalent to a cargo.toml that says, this is the version,
this is the package name, and you can take dependencies on other protos and
it resolves those and runs proto and gives you the flattened set of all the
files and resolves the dependencies between them.
We have a tool called Sguaba, which is a library for doing spatial math,
like rigid body dynamics or rigid body transformations in a type-safe manner.
And this is something that we use across a lot of our different code bases where
it's a real pain to get them right once, and you don't want every team to have
to get them right again and again and again.
So we build it once, and then we make that be a central library,
a central utility that every team can make use of, and then we also open source it.
And then we have other versions of this internally, like we have some tooling
for the Avro ecosystem where anyone who uses that internally now has access to our tooling.
And some of that we might open source, some of it we've already open sourced.
And so I'd say there's a pretty broad swath of reuse that's pretty intentional.
It's something where we see that there's a lot of cost to telling every team to reinvent the wheel.
And it's cost not just in terms of engineering time, but also in terms of correctness, right?
It means that you only manifest the bug once, you only fix it once,
rather than every team having to wrangle with the same complexities and the same bugs over time.
Yeah, that's also what I like about Rust in general, because undefined behavior
is a thing that you can centralize and then fix once, and then the entire ecosystem profits from the fix.
Yeah and i mean this is partially a property of just having a
good packaging system right like having a package manager like cargo means it's
pretty easy to turn something into a library it's pretty easy to take a dependency
on it and so that incentivizes doing this kind of of sharing which would be
more annoying and at least some other language ecosystems yeah.
Also, thanks a lot. I always like it when companies open source their work. It's super amazing.
And Squabble is certainly an amazing library too. You gave a talk about it.
We will link to it in the show notes.
And can you allude a little bit more to buffers?
Why is it important to have a package manager for protobuf definitions?
You must really like protobuf definitions and probably use them everywhere.
Well, so it comes as a pretty natural outcome of not having a monorepo, right?
Because you end up with different teams having protobuf files for their configuration
or their data exchange or whatever it might be.
And then you have some other team that also wants to make use of that team's protobuf files,
well then either you need to like be in the same repository as them or you need
to copy paste the files or you need to have a mechanism for publishing your
protofiles and then getting them into another codebase,
at which point you need versioning because you're not tagging a particular commit
of it, you need the version of it.
And thus you start getting transitive dependencies where maybe there's a shared
definition for maybe just data types even, so not full structure of gRPCs,
just some of the core data types.
And you want those to be shared across two different repositories inside of one team.
And then they both take your dependencies on that. And then anyone who takes
a dependency on either now also needs the transitive dependency.
And so you very quickly run into the situation of we basically need a package
manager here and we need versioning, we need packages.
And that's what Buffers was built to solve.
What do you use protobuf for internally?
So we tend to like having sort of textual code representations of protocols
because it makes it a lot easier to decouple the two sides of any given communication pattern.
And, you know, protobuf is one way to do that. Avro is another,
and there exist many others as well.
It also means that it's easier to,
Once you create that protocol divide, and if you encode the protocol separately
from any given code base,
it now makes it easier as well to change the technology choices on either side,
or just completely reinvent one side by rebuilding it from scratch,
but being able to reuse the definitions.
Why Protobuffer specifically? Specifically, it's a pretty mature toolchain and
ecosystem, and it has a proven track record of working well,
being efficient, having support for many languages.
And for that reason, I think it's a pretty obvious default choice.
But as you'll see, or as you might have seen from my streams,
for example, there are things where we use Avro instead of protobuf.
And the rationale for that is there are some features that Avro has that Protobuf
does not, where if for your particular use case, those features are worthwhile,
well, then you should pick the technology that has those.
So as an example, Avro support for this thing called logical types,
where you can annotate particular fields of a type with, you know, this is a F64,
but I'm going to annotate it with the logical type velocity in meters per second.
And protobuf doesn't really have that
mechanism for adding sort of richness to the the types
of fields there's like you you
can kind of hack your way there in protobuf but in abro that's just directly
supported by the tooling it also has better support for streams of blobs in
protobuf you don't really have that in protobuf you have it in gRPC but we don't
necessarily use these protocol definitions for RPC mechanisms.
We often use them to represent just the data format that's being exchanged in
various different formats and protocols.
And so just because we're using protobuf does not mean we're using gRPC everywhere.
Same thing for Avro. We might be using the Avro IDL for expressing data types
for protocols. We're not necessarily using the RPC mechanisms that are built on top of Avro.
Is it time to write an Avro package manager now?
Maybe. It's not impossible. I mean, running a package manager,
if it is as simple as, you know, assign a version, create a bundle,
upload it somewhere, is not that bad.
And I mean, if you look at buffers, you know, there is some amount of complexity
there, but it's not, you know, monumental. And we could probably pretty easily
take buffers and create an Avro version of buffers.
The thing where it starts to get really gnarly is when you want sophisticated
semantic versioning resolution, for example.
There's been some really cool work on...
There's this effort called PubGrub, which is essentially trying to write a version
resolver that can be reused across packaging ecosystems,
across different types of both specifier semantics in your dependency thing, thing,
but also write it in such a way that it's reusable.
So you could use it in NPM, you could use it in cargo, you could use it in whatever
package manager you dream of, and just get really rich resolution of package
dependencies, because it turns out that's a fairly complicated problem.
And then PubGrab also tries to give you good error messages from resolver failures,
which tends to be something that many package managers struggle with,
because they build version resolution in this kind of ad hoc way.
And with PubGrub,
Cargo hasn't adopted PubGrub yet, but it's sort of on the long-term roadmap.
And it has been for a while, so don't hold your breath.
But it does mean that, for example, buffers as of today only has exact version lookups.
So if you take a dependency on 1.2.3 and 1.2.4 is released, then buffers will not pick it up.
But if we integrated PubGrub into buffers, we would get this version resolution
behavior that we would say, the default should be semantic version like caret
matching like what Cargo does.
And at that point, moving to say Buffer's version that supports Avro feels like
it shouldn't be that bad because there's not too much that is protobuf specific inside of Buffer's.
It is more like a collection of files with some metadata that annotates version
and dependencies and then being
able to bundle those up into tarballs that you can publish somewhere.
And if you squint at it, that's like most of what many package managers are.
Okay, so we understand that management of those protobuf definitions is really
important for housing. But what do you use it in the first place?
So we do use gRPC for some parts of the tech stack, although gRPC tends to be
best suited for environments where communication is fairly reliable and predictable.
And you have like TCP, you have stable IP networks and the like.
So there we use gRPC, and then protobuf is a good way to make use of it.
But protobuf is also useful for just describing data formats.
Data formats is the wrong word, but like...
The structure of messages that are going to go over a network or in some cases
that will go to disk, although protobuf tends to be less well-suited for that,
but certainly for anything that
has to go over a network, but it doesn't necessarily need to go over gRPC.
So for example, there are especially closer to the edge networks that we have
where you have drones flying around where there's radios with very limited bandwidth,
connections that come and go you get jammed so
you lose connectivity or your bandwidth gets severely reduced because
you're flying through a zone where there's a lot of interference whatever
it might be in those environments gRPC is not going to work like realistically
you can't use tcp it will like the the network is simply too poor and too dynamic
for that to work plus you want to make use of things like you know if you have
a radio network it's inherently broadcast and so you send one packet,
you want to make it useful to as many other peers in your network as possible.
And so that's an example where we're not using gRPC, we're not using TCP.
And in fact, we've built our own stack that is reliant on CRDTs,
on conflict-free replicated data types,
that tries to build a distributed network where you can still get reliable exchange
of information, even in the presence of severe packet loss,
network reordering, packet reordering,
node sending, frequent updates over time, and you want to make sure you only
accumulate the updates that are newer, that old data gets erased when new data
replaces it, even if you get the packets out of order.
Like this sort of very...
Traditional distributed systems when they're not in a cloud environment setting.
But there, for the CRDT stuff we've built, we're still using protobuf for the
actual data definitions.
So the definitions of the schema, effectively, of what an application,
what data an application might write, might read, what gets sort of stored and
sent, all of that is still in protobuf files because, again,
it's something that people know.
The tooling is good around it. You know, it has editor support and all of that stuff.
We have buffers for being able to give you versioning over those schemas.
And so it makes a lot of sense to reuse that ability to describe your protocols,
your data definitions, even though the underlying exchange technology is very different.
And what are reasons for using CRDTs specifically?
You mentioned a few things already, but what's the bird's eye view?
When would you use it? when wouldn't you use it and maybe what is it in the first place.
Yeah so so crdts
are at the the very basic level
algorithms or it's
a it's a an exchange data type so it's a it's a data type that comes with some
algorithms that define what to do when these messages are exchanged over a network
or between peers in whatever way they are, such that if.
Imagine you have set up a distributed system where you have,
let's say, three nodes in that system, nodes A, B, and C, and node A and B concurrently
make edits to some underlying document or whatever.
Let's use document as a good example. The shared document between A,
B, and C, A is making edits to the document, B is making edits to the document,
and they can't talk to each other, but they're able to send packets to C.
C is now going to observe the changes from both A and B. how does it reconcile the changes from A and B?
And CRDTs are the data type plus algorithms that make C able to reconcile those
changes in such a way that if A and B's edits were not conflicting with each other,
then there is no conflict observed by C. It just observed both edits.
And if they do conflict with each other, then C has a conflict resolution strategy
in the sense that it can detect that there was a conflict and it can decide
what to do about that conflict in such a way that afterwards there is no conflict.
And so imagine for example that your document is like a key value
store if a writes to key foo and b writes to
key bar then all the edits to foo and all the edits to bar should just come
into c and there should be no problem if a deletes foo and b updates foo so
the same key and then they send to c then c is now going to have to decide what
to do when it observes a delete and an update at the same time.
And the CRDT would be something like, you know, it both informs you about the
metadata you need to add to the packets to detect that these are the same key and they happened
concurrently with each other rather than, let's say, the update happened first
and the delete happened after, in which case you should always take the delete.
But if they actually happen concurrently, which the CRDT metadata will tell you,
then the CRDT algorithm will tell C whether it should prefer keeping the updated
value or prefer the delete and the sort of baked into the data type itself,
how to resolve that conflict.
And is that resolution always the same, or does it depend on the use case?
Can you configure the resolution?
Yes, you can choose different CRDTs depending on the outcomes that you want.
So for maps, for example, the most common way to construct a map is something
called an observe removed map, where you can only remove items or updates that you have observed.
So in the case before of A, updates foo, and B, deletes foo,
I think I said it in reverse, but it doesn't matter, then because B did not
observe the update to A, it is not allowed to remove the update to A.
And therefore, the update to A will win out. And the result will be that foo
will not be deleted in the resulting resolution.
And the protocol and the algorithms and the metadata ensure that this is always the case.
But you can choose, there's a different CRDT that is not the observed removed
map that allows you, I forget the name of this one, but there's a CRDT for maps
that is specifically a, like removes wins,
where you are guaranteed that if you have a remove and an update,
the update will be removed.
And so these are different semantics you can choose by choosing the appropriate CRDTs.
And the CRDTs tend to also be composable. So you can say that,
you know, you have a key value map where the values themselves are also CRDTs of a particular type.
And you can structure them in this way where you choose the
semantics at every layer by choosing the appropriate crdt mechanism
and the the rule of crdts is
that as long as you observe all the same operations as another node in the system
you agree on the final state so they're guaranteed to be sort of commutative
and associative to the point where eventually everyone has the same state as
long as they get to exchange all messages.
I sort of have immediate follow-up questions, two of them. The first one would be...
How do you decide on such data structures? Is there a team meeting and then
someone kind of knows algorithms really well, algorithms and data structures and proposes that?
And maybe there might be competing algorithms that maybe you considered.
And have you been there when the decision was made?
Yeah, so our CRDT distributed system I built.
And part of that was out of a conviction that.
GRPC is not something you can run
on the edge network so we need something else and the
question then becomes what is the something else and you
know i happen to have a decent amount of background in building distributed
systems and so i've had a decent idea of what the different options were and
crdts felt like they fit this particular set of use cases or at least the use
cases we could predict we would we're going to have quite well there are other
designs you could come up with here, but they come with a different set of trade-offs.
I think at the time, I went into it with a conviction that this is the right way.
As you build more and more complex and compounding stacks, you start having
to document these decisions as well so that the motivation and the rationale
for making the choice is not lost to time.
And that's where you end up writing architecture decision records like ADRs,
where you write down, here's the problem statement, Here's the decision we made
for what algorithm to use.
Here are the options we considered and why we decided to discard them.
So that for the future, people have insight into the decisions you made.
And also, you know, the process of writing this document becomes the way you
take the decision is you convince the group as part of writing this document
that all the options should be discarded except for the one that you choose.
And I'm assuming there's still ongoing research on CRDTs.
Would you amend any of the prior decisions now if you had the chance?
I don't think so. I think the CRDT design has actually worked out very, very well.
And, you know, when we implemented the CRDTs in the first place,
it was built off of a very recently released paper.
So we started doing that implementation in...
Shortly after I joined, actually, so this is sort of end of 2023,
and we were implementing it based on a paper called DSON, which was released, I want to say 2022.
So it was very much sort of bleeding edge technology already when we started implementing it.
And then as part of implementing that set of data structures and algorithms
and protocols, we also effectively extended the research into things that were
needed for the operational use cases and production use cases we had in mind.
Sounds like a decent paper and.
Maybe i mean we did actually go a
lot of back and forth with the authors of the paper we've also open sourced
the the core of that implementation so there's a on crates.io there's a crate
called decent dson that is the core of that crdt and that compound set of crdts
precisely because we think this might be useful to other people and because
we think we made, you know,
innovations in the space that, you know, were not in the decent paper.
And some of this was purely because in order to put this into production,
we had to do a lot of both optimization, but also debugging to make sure it's
extremely reliable and fast.
And as part of that found corner cases that like in the paper were either handled in, you know,
suboptimal ways, or in fact, we found some bugs, at least in their prototypical
implementation, not necessarily in the algorithms that we wanted to correct.
And therefore also wanted to publish.
And when you did the implementation and you found those bugs,
did the Rust type system surface them easily?
Yes and no. There were some where the Rust type system was helpful.
So the original DSON paper was accompanied by a research prototype written in JavaScript.
And when writing the sort of rust
encoding of that same algorithm there were
definitely places where it pointed out that they had relied
on the javascript type system
being fairly forgiving where they were just sort of interchangeably using two
different types that just like really should not be mixed because you can very
easily get into bugs that way and i think we found like one or two bugs in the
in the research prototype again not necessarily in the algorithm but in the
research prototype that were because of this.
But I think decent amount of the sort of nuances of the algorithms,
especially when it comes to performance, for example,
are not things that would be caused by the Rust type system as much as they're
caught by a lot of testing,
like property-based testing, fuss testing,
and doing a lot of performance benchmarks and tracking down the root causes.
They're like things that are hard to catch in the type system.
Now when you look at all of these things that you've implemented so far and
there's certainly a lot of amazing things in there I also want to check out
your CRDT implementation now,
what would you say was the biggest learning since you started modeling logic
in Rust how do you model your types nowadays things that people can learn and apply in their own work.
I think there's a trade-off that you learn over time of where is it worthwhile
introducing more types and where is it not?
Where is it worthwhile making things generic versus where is it not?
I don't think there's a hard and fast rule that you can just always follow.
But over time, you develop a sort of intuition for this doesn't feel like a
good use of a type or this feels like it would be dangerous if I don't add a type.
Like you can start to sort of predict the bugs that people will make if you
don't introduce a new type.
And you also, whenever you introduce a new type, you like feel a tingling in
your hands about the amount of pain you just introduced to people using it because
now there's an extra type or previously that it didn't need to be.
And so I think the thing I've learned, and I think this is not a,
it's not a moment in time learning. It's sort of a lesson over time is to better tune that trade-off.
And I think one of the observations I've come to is.
Representing things in the type system is extremely valuable,
and people don't do it enough.
But you have to balance it against the pain of using the library that you've
developed that has all of these types.
And you need to really keep in mind what am I actually gaining
in terms of the safety I add to the system when I introduce all
these types and what is the cost of the people using it
and the way you do that is by making sure that when you
write for example a library like Sguaba for example for the rigid body transformations
also write code that uses that library while you are writing this type states
library because it will immediately show you just how painful the consuming
code ends up and that will help guide your way into,
okay, maybe I don't need a dedicated type for this.
So an example here maybe is in Sguaba, we have a type for WGS84,
which is basically GPS coordinates.
So latitude, longitude, and altitude. Now it turns out that WGS84 is actually a moving standard.
It's a moving standard because the parameters.
Earth changes over the course of
time like you get you have you know plate drift
and drift of the magnetic north pole and like they also update
the reference ellipsoid sometimes when they get better estimates of like the
roundness of the earth and so as a result there's not actually one wgs84 there's
like multiple over time and if you capture a coordinate right now and then in within 10 years.
You try to take the same coordinate and plot it on a map, then it wouldn't be
in the same place as where the original one was.
So if you read out the GPS coordinates of the Eiffel Tower, the actual GPS coordinates
of the Eiffel Tower will change over time.
But they'll change very little, usually.
But it doesn't mean that technically, like
you kind of want the type system to represent the
point in time at which this measurement was
taken the more extreme case here is so
you have this thing called local tangent planes which is basically imagine
a plane is flying and it has a some gps coordinate
and then you want to know you know the the relative
location to the plane so something like
either in front right down coordinates or north east
down coordinates so it's like this thing is one
kilometer north three kilometers east and 500
meters up from me then that is a it's a local tangent plane to the current location
of the plane so i record that coordinate but now the plane moves then you kind
of want to represent the fact that this was a coordinate relative to that plane's
position at this point in time.
The reality is if we actually tried to encode that information in the type system
of Sguaba, it would be impossible to use because every type would be distinct.
You just like, there would be no easy way to move between different coordinate systems.
You would end up with these like, WGS84 would have like three different generic
type parameters that are like involve time. And then how does that translate
into other coordinate systems that involve time?
And it would be more accurate, it, but it also would be much more painful to
use, and it's not entirely clear that it eliminates very many classes of bugs.
Quite to the contrary, it might introduce more, because now if you get the wrong
time bases, then nothing compiles, and then you pull shortcuts to try to get
out of the mess of errors you end up with.
And so for Sguaba, I made the explicit choice to say, this library will not
represent time in the type system.
And that does reduce the safety you get from the type system,
but it also makes the library much more pleasant to use, which increases the
number of people who will use it,
and therefore overall increases safety compared to if I had put it in.
Is that an example of the conflict between ergonomics and correctness?
I think that's right.
And, you know, it's like, I don't actually think it's sacrificing correctness,
it's sacrificing precision.
And precision can be useful for correctness, but you can also have correctness
without that precision, right? So you can write correct programs that don't
have time represented in the type system.
It just means that there's some classes of bugs that you don't get to eliminate
through the type system. but that doesn't mean you end up not having correctness yeah.
So from your lens it's still correct because it encodes everything that you
want the type system to encode but you just explicitly leave out a thing that
you don't want to be so precise on yeah.
And where where i think you know i'm going to leave it to the users of this
library to get that part correct.
Do you make a difference between library code and binary code?
Do you write different code if you had to write application-level Rust versus library-level Rust?
Yeah, I think I do in the sense that when I write library-level Rust,
I think a lot more about the programmatic API that I present.
So that includes not just documentation, right?
Obviously, you need to write good documentation for a library to be useful,
but also the structure of that API.
What are the backwards compatibility hazards? Where do I think I might put myself
into a trap when it comes to breaking changes down the line?
So I want to be conservative about what things I expose in the public API because
those are things I can't change later unless I do a breaking release.
The way that you propagate errors might be different because you might want
library consumers to have a better ability to deconstruct the error and figure out the origin.
Whereas in a binary, usually what you want is to present a chain of errors to
the user that results in something actionable on their end to fix the problem.
And so I do think the design ends up somewhat different.
I don't think it changes the internal writing of the code very much,
but it changes how much focus you put on the external API.
But for a binary, of course, you have to think about what does the command line
interface look like for that binary and so that also requires thought but it's
it's a different kind of design process earlier.
You mentioned that you also want to write the application level code that goes
along with the library code or actually vice versa does that mean you start
with a main.rs model out your types and then gradually move them into library crate.
It can i do do that sometimes as well but more commonly it means i already have
at least two code bases that,
have a need for this library. And so I'm going to create the library and see
how it affects those two codebases.
Because then I have a real set of use cases that I can test out how the library feels.
And I mean, this was the case for Sguaba, for instance, was we had codebases
internally that had to do this kind of spatial math.
And they already had code for it, like they were working code bases but
the that code was like hard to
review brittle had a bunch of like magic constants in
there and so it didn't it felt like
you know we've had to solve this problem at
least twice so we should turn it into a reusable library that is you know well
designed well tested and because it would be recommended going forward and so
that is what informs the the the design of the library is the the the evident
need from the things you've you've already built when.
You build squab up what was your testing strategy was it based on unit tests
integration tests or or property-based testing or fuzzing.
It's it's all of the above so the there's both a bunch of unit tests in there
there's also a lot of equivalence tests to other crates that implement some
subset of the functionality.
So for example, there's a crate called NavTypes that implements,
for example, conversion between WGS84 and ECEF, which is another Earth-based coordinate system.
And so there's a property-based test inside of Sguaba that basically generates
random points on Earth and then converts them back and forth using Sguaba,
converts them back and forth using NavTypes, and then checks that the results are near each other.
And then there's also just general property-based testing that does things like,
you know, if you pick a random coordinate on Earth,
run it through like back and forth through WGS84 and ECEF, which is a lossy
conversion, run it through that 10 times and see how much degradation you get.
And you get guarantees about, you know, you get probabilistic guarantees about
how much deterioration will you see over time.
And then we also have a bunch of tests around the enforcement of the type system, right?
So both tests to make sure that you can express the correct computations,
but also compile fail tests that say you cannot try to use a coordinate from
one coordinate system as a coordinate in a different one without an explicit conversion.
You put in so much work into Sguaba, similar libraries, and then you decide
to open source that work. that's a big gift why do you do that why open source those libraries i.
Think it's a it's a combination of factors one
of them is the the the traditional you know by having more people being able
to look at a thing you're more confident that it's correct and i think that
applies to to open sourcing software in this context too and and i think there's
a related point to that which is,
you know, we operate in the defense sector.
And so the systems that we build, we want to have as much confidence as we can is correct.
But we also want to sort of, to the extent that we can, give people the ability
to look at how we build software for whether they think we are building software in a responsible way.
And obviously we can't open source like the actual products we're developing,
but at least one stepping stone is to open source some of the techniques,
some of the tools that we use in order to produce this software to the level
of sort of reliability that we want to give.
And so open sourcing these kinds of libraries, I think,
gives hopefully both some feeling of transparency on that part,
but also inspires some amount of confidence that we are building this software,
at least at a technical level with care and so I think it matters to demonstrate
that and then I think there's a there's a.
Sort of a wanting to give back kind of feeling, right?
Of we get a lot from the Rust community.
And I mean, we are sponsors of the Rust Foundation partially for this reason to give back.
But the other way to give back is to make sure that when we build things that
we think are useful to other people than us, that we make them useful to other people than us.
And then I think that there's obviously a cynical angle too,
right, which is you put things out there so that other people get to look at
interesting things you've built and then go, I also want to work on those things, right?
Do you get to actually show some of your code, show some of your development
styles, show some of the problems you're working on and hopefully get other
people interested as a result?
Yeah. In preparation for this interview, I read through some of the blog posts
on the Helsing Tech blog.
And I have to say, it's astonishing and certainly enticing to know what sort
of problems you're working on.
And i think it attracts a certain group of people who are interested in solving
hard problems and working with rust because they know that rust is the right choice.
I i think that's true and if you
think about the flip side right imagine we didn't open source anything we
didn't write any technical blog posts then i think the the
first question would be well why not what do you have to hide right but
the the other observation is how do you hire like
especially you know talented engineers who are curious
about technical depth if the only
thing they see is sort of the the product side of
things externally like they don't have direct access
to the engineers we have internally so you kind of you would have to apply go
through interviews and then get to talk to the engineers which is a lot to ask
of someone who's still like deciding whether they might want to join the company
and so by opening the doors a little bit and showing some of the work we work
on and the way that we actually do engineering,
You give people more insight and therefore hopefully more to go on when deciding
whether this is a place where they would want to work.
Do you get any contributions, pull requests, people creating issues?
We do. It varies between the different projects, right?
So the different things we've open sourced are varying degrees of useful to
other, like the Dson Crate, for example.
I'm not expecting lots of people to make use of because it's a very,
you know, you need to have a very particular use case for those to be the most useful to you.
And then other things like the Avro tooling, for example, I actually expect
could become quite popular because a lot of people use Avro and this thing gives
you faster and better error.
Like it's faster than the upstream version and it gives you better error messages.
So we might get a bunch of people using it, but it's fairly new.
Squab, I expect, would probably be quite popular.
And that's also the one we've seen the most interest in, actually,
of people wanting to find issues, reuse it, contribute to it,
and we take those contributions seriously.
Buffers has been a bit of a mix where people have had interest,
but I think the companies where this becomes the most relevant,
many of them have monorepos and therefore don't need this particular tech.
As you need to both have a need to do versioning and packaging of extended dependency
chains of buffers, of protobuf files, and also not have a monorepo.
And that combination, I think it's somewhat rare, although not unique.
But in general, we do see interest on the projects we put out there.
And I believe one other angle is that a lot of people might just be interested
in knowing how a Rust expert structures a library.
And there's very little material out there outside of maybe a handful of popular
crates and maybe a bunch of blog posts on how to write advanced Rust code.
And a lot of people want to learn by osmosis, by reading what other people have
written that they deem to be Rust experts.
Yes, I think that's also true.
After all those years, do you still see yourself as an educator and do you do
Rust education outside of Helsing or also within Helsing?
Yeah, I very much see my job as an educator. And it's something that I have,
you know, I think a pretty deep passion for.
I really enjoy, you know, that moment where you can experience someone else
understanding something. That makes me very happy.
And so, you know, I continue to do education outside of Helsing.
And it's the same thing I did at Amazon as well, where, you know,
a lot of my live streams and stuff, that all continued while I worked there.
And it's the same thing as Helsing.
I do some amount of education internally at Helsing as well.
Although internally, it has more of a sort of reactive nature, right?
Where people will poke me and be like, hey, Jon, why doesn't this work?
Or how should we do this? Or because like we do some amount of office hours
internally, but also more of that, you know, we have like a Rust help channel
where people ask and then occasionally I'll get, you know,
poked explicitly and be like, I think Jon wrote a blog post about this,
or I think Jon implemented something along those lines.
But I actually think the most amount of education I do that has value internally
is actually the external education that I do.
So I know that a lot of the engineers we have at Helsing have learned or partially
learned Rust through my public educational resources, right?
And that is also how some of them continue to learn new concepts in Rust,
is to observe the same teaching resources that I put out publicly.
And this is also why I think Helsing is quite supportive of me continuing to
do my public education, because not only is it sort of a, cynically speaking,
like a sales thing, right?
Like it's valuable to the company to have someone who's seen as a Rust expert,
both publicly be operating as a Rust expert and be employed by them.
But I think more meaningfully, it also means that more people are able to learn
Rust through the abilities or through the teaching that I do,
which means that there's a bigger hiring pool for Helsing to draw from.
But also, the people that we hire, hopefully, have then also learned more things
about Rust because I've produced those intermediate teaching resources.
And the people at the company can continue to improve their skills by me continuing
to do teaching. So it ends up being a virtual cycle in a way where it's good
for the company, which is good for me, which is good for the company, which is good for me.
And I do think there's also some amount of recognition internally at the company that...
You know, if I started just building internal teaching resources,
it would be seen as a bit of a shame, right?
It would be like, why are we not making this material public when it could be?
There's nothing secret about it. It's just how do you build good Rust code?
How do you engineer high quality Rust products?
Then that feels like something we should be sharing back to the community because
those aren't, they're not industry secrets, right? They are just things that
are beneficial to everyone using Rust. And I think it's also the case that,
If we make the Rust community better, we benefit as a result,
not just through hiring, but also because the quality of the crates that exist
in the open source ecosystem will be better.
The tooling will be better. Everything gets better if the community improves.
And so there's just a lot of positive externalities here and positive feedback
loops that mean that me continuing to do the public part of education is valuable.
And with 200 plus hours of video material out there of you teaching rust i sort
of think it's inevitable that people share videos of you internally without you knowing.
Oh that that definitely happens i
mean this happened at amazon too where the moment
i got on the inside i kept finding places where
people had referred to either things that like videos
i'd made or blog posts i'd written or crates i've published
being like Jon go look at this thing or like you should read Jon's thing about
this and that is always fun it's the same thing when when we do hiring a decent
number of the people that go through the interview process say that either they
learned rust through me or they heard about the company through me and that
is a weird feeling for sure do.
You see people taking it to the extreme sometimes where maybe they they learn
about an advanced concept and they want to apply it at work and during code review you find well,
it's expressive it's certainly concise but maybe not maintainable by a larger
team and where do you draw the line.
Yeah, I actually think this is not just a like intermediate Rust programmer thing.
I think this is pretty common across Rust, even from the early days,
is that people see all of these tools and techniques that are possible in Rust,
and then they immediately want to make use of them.
And I think the new type pattern is a big one, right? Like I can define custom
types for everything, and then everything is type safe.
And people start using that pattern to the extreme. And to the discussion we
had earlier around the trade-off space here, they just navigate the trade-off
space by always picking the most type-safe thing.
And we see the same when you look at the hesitance to use locks,
the hesitance to use RC and ARC.
So people try to use lock-free algorithms and use references with lifetimes
everywhere. And you end up with multiple lifetime annotations.
And no one wants to clone anything. and everything has to be monomorphized so
there's no dynamic dispatch.
And people really lean into every possible feature that Rust gives you.
And it makes it really painful to program in the language. It makes it painful to review the code.
It means that you end up constructing suboptimal software architecture because
you can't express the architecture you want with the borrow checker,
with the type checker, or even just with your current knowledge of the language.
And so I do tend to see.
Especially in people who haven't built production code in Rust very much,
they tend to lean overly much into some of these patterns. And then you kind
of have to pull them back and be like...
It's okay to clone here. It's okay to put this thing behind a mutex.
It's okay to not have a new type for this particular string representation of an email, right?
And over time, people learn that distinction.
But that is part of the education you kind of have to do on the job is see the
code that people write and then course correct as you do for where they're maybe
overzealous about the use of some of Rust's features.
Yeah, it's certainly a bit of a rite of passage.
Yes, I think so.
Do you think the key to idiomatic Rust is keeping it simple and then maybe making
it right where it matters?
So finding that balance between on one side simplicity and maybe ease of maintenance
versus correctness for things that really are important?
Or what's your working definition of idiomatic, Rust?
I think it's very hard to give a sort of general definition.
I tend to start from, I don't think you should start with the simplest possible
thing, but I also don't think you should start with the most complicated thing.
And this is where, and I don't like using this, but I think experience matters here, right?
Where like over time, you just get a feel for where the balance should lie.
You start writing the code and you go, it's okay to clone here,
and it's not okay to clone here.
And it's hard for me to distill what the principles are for making those choices.
I would say in general, it is very useful to have a running system.
Once you have a running system, you can then, like, refactoring with Rust tends
to be a lot easier than in other languages because you have the type system
and the solid compiler and type checker and borrow checker to rely on.
And so I would tend to err on the side of where the type safe thing is easy,
then do the type safe thing.
And where the type safe thing gets in the way of you building the actual application
to the end, build the application to the end first, and then mark it with a to do to come back to.
And some of those to do would be very painful to fix later.
But the reality is, if you don't finish the whole thing, you're never going
to come back to the to do's because you didn't work in the first.
Place so so it's useful to like use that that as a forcing function for making
you make a suboptimal choice is like well i need to at least get to a thing
that runs otherwise this whole thing is irrelevant.
Rust is a huge language, and while I'm pretty sure that you know more than most
people about Rust, what is one thing that you would personally want to spend more time on?
If you had three months to focus on one subject that was Rust-related, what would it be?
I think there are two categories for me.
One of them is around WebAssembly. So I've done very little WebAssembly in Rust,
and I think it's both a cool technology, And it's something that I think there's
a bunch of use cases for it that I think we haven't fully explored.
And I would love to fiddle around with it to see what I can make of it.
But also, I think it's a very useful skill or set of knowledge to have in your toolbox.
And same thing for, you know, I'm thinking of writing another sort of version
or not version, but a second iteration of Rust for Rustaceans.
And, you know, Rust Rustaceans doesn't have a chapter on WebAssembly,
in part because I hadn't done very much WebAssembly at the time when I wrote
the book and I didn't feel like I could be an authority on that subject.
And I still don't think I could be. And so that is something I would want to do more of.
I think the other category would be sort of deep embedded development.
I've done some embedded development in Rust and I've certainly written,
you know, crates for no-std and everything.
But to really write something low level on a microcontroller where, like, you need to, like,
You need to initialize the CPUs in the right way, and you need to handle the
interrupts, and you need to write some inline assembly.
Code like that is really fun to write, and I haven't written lots of it in Rust,
but I would like to, because I want to see, you know, what does it feel like
when I push the language in that direction a little bit?
And like, where are the sharp edges, and what can I do to make those sharp edges
be more ergonomic, right?
Like, this could be building tooling, building libraries to make that experience better.
I wanted to briefly touch on supply chain security because I believe it's kind
of important for housing.
You write a lot of code, you maintain a lot of code yourself,
but you still need to depend on a lot of crates that are out there that we sort of take for granted.
And there's been some recent challenges around some packages in Rust.
I don't want to mention any names.
And Cargo itself also had some sort of exploit because of the tar crate just a couple days ago.
I wonder what's Helsing's stance on that?
And what's the state of the Rust ecosystem in regard to supply chain security?
I think Rust is not in a worse place than other ecosystems here.
I think it's a bit similar to other ecosystems where when you take a lot of
third-party dependencies, there's some amount of inherent risk there.
And I don't think Rust's tooling is worse or Rust's risk is higher.
And I think the question, as with any language like this or any project that
has to take third-party dependencies, the question becomes, what do you do about those risks?
And you think, you know, the reality is you have to build in defense in depth
against these things, right?
There's not going to be one silver bullet that just solves all your supply chain security problems.
Instead, you have to have sort of a collection of processes and tools that make
sort of as many parts of it as secure as you can.
And then you layer them on top of each other to get coverage across your whole pipeline.
So this includes everything from, you know, being judicious about your selection
of dependencies in the first place.
Like, don't take a dependency on some tiny, barely maintained project if you
can easily just replicate the functionality yourself.
Like the reason why it's worthwhile
to take dependencies is if the maintenance cost
of the code in that dependency is large but if
the maintenance cost is actually pretty small it might not be worth taking the
dependency and introducing that risk the same thing with if you have the choice
between multiple dependencies then look at the different dependencies not just
in terms of the the quality of them in terms of like the you know the api the
documentation the current code maturity,
but also look at the maintenance of that package.
Who maintains it? How many people? Do they have CI?
What kind of testing strategy do they have? Do they have a security disclosure policy?
There's a bunch of things you can look for here that indicate something about the.
The sustainability of that package and of taking a dependency on it.
And then there's also the sort of ongoing monitoring part, right?
So obviously you want to monitor all of the security vulnerability databases
to make sure that if you run into a problem, or rather if a problem is discovered
with some version of some dependency,
you A, are notified, and B, that you internally have the infrastructure to find
all the places where the impacted dependency are used.
And so this includes being able to track provenance information for all the builds that you do,
provenance for all your deployments, and this is where you get into things like
generating SBOMs, like software bill of materials that list all of the dependencies
that went into a given artifact,
tracking which software releases are released to what customers,
at what time, in what products, in what physical devices, and keeping track
of that whole graph structure.
And being able to do analysis over that graph over time as you learn about new vulnerabilities.
And then, of course, there's also work here on security scanning.
So this would mean both doing scanning of our code for insecure patterns and
the like, but also running proactive scanning on dependencies that we take.
The first time they're brought into the company, anytime there's a new version
and so on, to actually scan them for.
Is this a dependency that we want to take? And some of that could be human review.
Some of that can be AI-assisted review. And there's a combination of these that also could work.
We wrote a blog post recently about using AI for assisted vetting of software packages.
I'll send you the link and you can put it in the show notes.
And so that contains some more thoughts about how you can not necessarily replace
the human review here, but at least make it more efficient for humans to review
those dependencies that you take.
And so there's like a whole host of techniques where you kind of need to do
all of them because they end up giving you, so each one gives you sort of partial
coverage of the stack that you have.
And only when you combine all of them do you get the defenses that you need.
But even then, you know, taking dependency ultimately is a risk.
And so you have to take the calculator risk of is the upside of taking this
dependency worth the potential risk that you're introducing.
But I do think that there's a genuine security case for taking dependencies
because the alternative, if you build everything in-house, is that you will not have the people,
especially sort of subject matter experts, to maintain those internal implementations over time.
So if we internally implement it, I don't know.
I mean, crypto is the obvious example of like the old adage of you should not roll your own crypto.
If we personally like implemented all of our own crypto libraries,
I'd be deeply uncomfortable with that because we don't have enough, you know,
cryptography specialists and analysts and engineers to A, build it in the first
place and then B, maintain it over time.
So I would much rather that be a publicly vetted, widely used.
Continuously handled by a large number of security experts, and then we take
a dependency on it, is a much better and more secure decision for your dependency
chain than in-housing everything.
And the question really becomes that risk-reward trade-off of at what point
does it become better to just in-house that dependency so that you don't take
an external dependency on it because the upkeep is not that bad or the upkeep
is not likely to have security implications compared to the third-party dependency.
There's a common trope that people use, which is Rust's package ecosystem is similar to NPMs.
We have a lot of smaller packages, and that exposes us to bigger risk.
What's your take on that?
Yes and no. It is true that Rust tends to have more dependencies than Java or C++, for example.
It tends to have more but smaller dependencies.
I think the jury's still out on whether that's a good thing or a bad thing,
because the downside with taking a large dependency is that A,
the large dependency means the maintainer of that project is maintaining way more code.
And chances are they're not an expert in the entirety of that code.
And so the likelihood that any part of it is under maintained or under vetted
or underdeveloped is much higher.
And then the other is the breaking changes sort of update cadence tends to be
worse because if you have one giant dependency that you have to,
like now they make a breaking change, that might be a lot harder for you to
adopt because you have to adopt all or nothing of the entire dependency.
Whereas if you have many smaller dependencies, fewer of them make breaking changes
at any given point in time.
So more of them will be fully up to date because they don't need to align on
like a single breaking change schedule.
But ultimately, you know, there are some things where taking a large dependency
is probably worthwhile. And we do see this in the Rust ecosystem too.
If you look at things like Bevy, for example, right? Bevy is effectively one
big dependency. Tauri is another one.
And so Rust doesn't preclude you from doing this.
It's more that I think it often makes sense to have smaller dependencies of
this kind. And I don't think that's inherently a, it's not obvious to me that
it's a guaranteed security risk compared to the alternative.
You also have the downside with large dependencies that you could end up with
because it's so large, you need to have many maintainers.
And so you have a large number of maintainers that,
Would it be better if that same number of maintainers each maintained a smaller
library that was a subset of the overall thing?
I think that might be better. I'm not sure.
But yeah, it's not clear to me that Rust is in more of a danger because it has
this coarse or this finer granularity of packages.
I really like your take on Rust's great ecosystem and also contrasting it with
whatever Node and NPM provide.
What about unsafe code, though? Because this is very unique to Rust and to how
we think about safety and code.
If you look at this from a perspective of supply chain security,
aren't we exposing ourselves to a lot of risk by taking on a lot of unsafe code?
And also how would you vet for that.
Well it's complicated because unsafe code is not inherently less safe even though
the name kind of implies that right because if you look at java if you look
at go if you look at node.js and and certainly if you look at c and c plus plus.
Those languages have no guardrail for what is safe and what is unsafe.
You know, if you look at Java, you have unsafe operations in Java as well,
where you do direct pointer manipulation and you can do really bad things there
and you can violate memory safety and you can do all these things.
In C++, like all the guardrails are just off. Even if you turn on a lot of the
compiler validation, like you can just do these things.
It's just that in Rust, it's more obvious when you do these things than it is in those languages.
And so I think in Rust, the reality
is that safe and unsafe is more of a communication mechanism to say,
this part of the code, you should look at more carefully because it needs to
sort of uphold things that are not checked by the compiler.
And so it's places where you get fewer of the benefits of safety from Rust,
but they're not places that are inherently less safe than general third-party software.
I do think actually that, you know, you should think a little bit more about
taking a dependency that includes unsafe than one that does not.
But I don't think it should be a thing that, you know, excludes you from taking
that dependency or see it as significantly more risky.
Where I worry more is when you have crates that have no business having unsafe
code, but they do anyway.
Usually this is for like performance optimizations reasons or just they want
to work around the borrow checker.
That kind of use I'm more skeptical of, but it's not really a binary of is it unsafe or is it not?
Do you think unsafe, the term, was a misnomer?
Yeah, I think in a way it is, because there are two uses of unsafe in Rust.
One of them is on function definitions, and the other is on blocks inside of code.
On blocks inside of code, it really should be described as safe.
The goal of that annotation is to claim that the code between this curly bracket
and this curly bracket is not subject to the standard compiler checks or it's
allowed to break some of the rules or it's not checked that it doesn't break the rules.
But trust me, I have checked that it is safe.
That's what you're asserting by putting unsafe around a block.
And so it's not to say this code is unsafe. It's actually to say this code is
safe. It's just that it's checked by me, not by the compiler.
And when it's put on a function definition, unsafe is more of an appropriate
term because it's saying,
this function is unsafe to call unless the following is true yeah but so i i
do i do wish that there was a different name for for the the unsafe in a block.
I like that you said assert in that context because yeah it feels like an assertion
maybe it could be called assert safe.
Yeah that is the the challenge right is that asserts we think of as something
that can fail and the assertion here can't fail like the the program can't fail
to run as a result of that assert it doesn't actually check anything it's like
a signature that i just trust me i've checked.
What's next for Helsing and for you.
For Helsing, I think it's continued development of the main product lines that
we currently have, all of which are fairly ambitious and have some really cool tech in them.
And so this includes the CA-1 is the project I work on, which is basically an
effort to try to build a self-flying jet fighter in two years.
So it's a very ambitious timeline. We're building the whole thing,
hardware and software, from scratch.
It's no joke, but it is really interesting work.
And then we have the SG-1, which is an underwater, also autonomous vehicle that
aims to do business to look for things like submarine traffic in large underwater areas.
Think like the Baltic Sea, for example, where you need these things that can
stay underwater for very long periods of time, have very minimal compute,
very limited ability to, you know, very constrained budgeting for power.
They can't have any moving parts in them because then they're too easy to detect
and then you still need to write software on there that does sonar analysis
for example so they need to run ML models how do you do that when you have extremely
constrained battery power.
And so that's some interesting stuff and then obviously we do
a lot of work with the HX-2 for use in Ukraine for instance where that is a lot
of work on being close to the sharp end of defense but also systems where safety
criticality is enormously important.
And it might sound self-contradictory to talk about safety critical in a system
that explodes, but the reality is that it is extremely important that it explodes
in the right place for the right reasons at the right time.
And that's where the safety criticality comes in. And the cost of getting that
wrong can be catastrophic. as a working on these systems, I think is,
you know, it's, I don't like to describe it as fun because it feels incorrect, but it is challenging.
It's interesting. It is...
I think it's meaningful. I think it's important. And I think,
you know, in terms of the future of Helsing, a lot of that becomes continuing
to work on these products and other ones to see how far we can go with,
you know, using software to build really good deterrence capabilities for Europe.
And for you, which streams can we expect in the near future?
I actually tend to not plan my streams very far ahead.
Instead, what I do is I look for things that I want to exist,
either at work or in my personal life, and then I build those and then I turn
on the camera while building them.
Because if I try to do streams that are where the content is...
Hand-picked for streaming. It sometimes works if I pick a topic that I know
is particularly deep and gnarly in there, but it's much more compelling when
I can say what my use case is and what I'm building towards.
It means that I can build a solution that actually solves a problem and that
therefore comes with additional constraints on the implementation,
a less generic goal, because I think that's where a lot of really good software
engineering practices come in are when you're not just writing code,
but you're writing code for a purpose.
And so that's why my streams tend to be when I have a thing to build,
then I do a stream on that topic, which was the case recently,
for example, for the Avro IDL converter was I needed one of those, so I built it.
And I think increasingly we'll see other examples of this of I need a thing
either for work or for my personal life, and I will build them.
That's how I've done my streams until now and how they will continue.
And so far, your intuition hasn't failed you around that.
That's true. I have some ideas for other forms of streams or other forms of
educational material that I have sort of half-baked in the back of my head.
I think some of them could be really good, that the challenge is always,
when will I find the time?
And so this could be things like, I've had an idea for a sort of Rust intermediate
course that would actually be a structured course rather than just a sort of ad hoc streams.
I've had ideas for more of a,
casual chat beginner's introduction to Rust that comes in shorter snippets rather
than like 10-hour streams.
I think that could be really cool, but finding the time is always the hard part.
There's a ton more that we could discuss, but we have to get to the end.
And traditionally, our final question is, what's your message to the Rust community?
I think my message to the Rust community is twofold. I think the first half
is Rust is now a language that is actively in use in important systems across the world.
And that's a good thing, right? This is what a programming language aims for,
is to be successful to the point where it's adopted for real use cases that
makes a real difference.
But the result of this is that companies care about what happens to the language
and the direction the language goes in. And I think this is sort of,
a little complicated for the Rust community, which traditionally has had very
strong opinions on not just how the language is used, but also sort of there's
been an association, I think,
with like a value judgment on what the language is used for.
And increasingly, I think the Rust community will have to come to terms with
the fact that it's being used in production in ways that we don't control.
And I think we need to decide how to interface with that.
A good example of this is like sponsorship of Rust conferences, right?
Or in general, like sponsorship of the Rust Foundation, for example,
like putting money into the Rust ecosystem, where this to me is like a good
thing for the Rust community. I want that to happen.
I want the community to get the language of the funding that it needs to continue
to grow and continue to develop.
But that comes at the cost that the language needs to prove that that influx of money is worthwhile.
Because in general, if you want the funds to continue to come,
there needs to be something that you get back for the money you put in.
And that's not to claim that I have the answer for this, but I think there's
been a sort of allergic reaction in the Rust community to industry involvement in the language.
And I understand why in many ways, but I do think that it is an allergic reaction.
We have to figure out how to mitigate because otherwise the funding for the
language dries up and we end up with a language that doesn't have the funding to keep growing.
And then I think that the second half of my takeaway from the Rust community
is we've built something that's really good.
Like the Rust language is really good. The tooling is really good.
The community is really good.
And I think that the ecosystem as well is really good.
But I also think I've observed a sort of stagnation in the,
I don't know how to describe it, like the creativity of the use of the language
compared to some of the early days.
And that might be because we've solved many of the problems, right?
But it does feel like there are still cool things you could do with the language
that can change how the ecosystem works.
serde was a good example of this from back in the day where it was a different way of doing things.
And then it made us have things like derived procedural macros and the ability
to do serialization in a really efficient and cross-language ecosystem way.
And I want to see more of those kinds of ambitious, let's build something that
is different, is better, is cool, new, innovative in the language, in the ecosystem.
I'm seeing less of a hunger for that than I think I saw in the early days.
And that makes me a little sad.
Well here's hoping that people like you who are ambassadors of the language
will bring some of that passion back and i certainly hope to see you on the
stream i will follow along Jon thanks so much for taking the time and for being
part of this community no.
Thanks for having me it was a fun conversation.
Rust in production is a podcast by corrode it is hosted by me Matthias Endler
and produced by Simon Brüggen.
For show notes, transcripts, and to learn more about how we can help your company
make the most of Rust, visit corrode.dev.
Thanks for listening to Rust in Production.
Jon
00:00:29
Matthias
00:01:51
Jon
00:02:15
Matthias
00:02:19
Jon
00:02:40
Matthias
00:05:11
Jon
00:05:21
Matthias
00:06:49
Jon
00:07:04
Matthias
00:08:17
Jon
00:08:34
Matthias
00:09:46
Jon
00:09:51
Matthias
00:10:48
Jon
00:10:57
Matthias
00:13:29
Jon
00:13:45
Matthias
00:15:20
Jon
00:15:30
Matthias
00:16:23
Jon
00:16:34
Matthias
00:17:47
Jon
00:17:50
Matthias
00:19:27
Jon
00:19:33
Matthias
00:20:25
Jon
00:20:29
Matthias
00:23:22
Jon
00:23:33
Matthias
00:23:56
Jon
00:24:23
Matthias
00:25:36
Jon
00:25:40
Matthias
00:28:03
Jon
00:28:07
Matthias
00:30:34
Jon
00:30:43
Matthias
00:33:35
Jon
00:33:49
Matthias
00:36:24
Jon
00:36:30
Matthias
00:38:15
Jon
00:38:40
Matthias
00:40:11
Jon
00:40:21
Matthias
00:41:07
Jon
00:41:10
Matthias
00:42:03
Jon
00:42:11
Matthias
00:43:26
Jon
00:43:50
Matthias
00:48:52
Jon
00:48:57
Matthias
00:49:32
Jon
00:49:44
Matthias
00:49:51
Jon
00:50:00
Matthias
00:51:22
Jon
00:51:39
Matthias
00:52:44
Jon
00:52:51
Matthias
00:54:11
Jon
00:54:25
Matthias
00:56:30
Jon
00:56:58
Matthias
00:57:54
Jon
00:57:59
Matthias
00:59:23
Jon
00:59:51
Matthias
00:59:53
Jon
01:00:04
Matthias
01:03:32
Jon
01:03:46
Matthias
01:04:21
Jon
01:04:43
Matthias
01:06:45
Jon
01:06:49
Matthias
01:06:50
Jon
01:07:16
Matthias
01:08:51
Jon
01:09:08
Matthias
01:10:44
Jon
01:11:32
Matthias
01:17:13
Jon
01:17:28
Matthias
01:19:42
Jon
01:20:16
Matthias
01:22:06
Jon
01:22:10
Matthias
01:23:12
Jon
01:23:21
Matthias
01:23:42
Jon
01:23:46
Matthias
01:26:03
Jon
01:26:08
Matthias
01:27:27
Jon
01:27:31
Matthias
01:28:10
Jon
01:28:19
Matthias
01:31:42
Jon
01:31:59
Matthias
01:32:00