WEBVTT

00:00:01.570 --> 00:00:06.210
<v Matthias>This is Rust in Production, a podcast about companies who use Rust to shape

00:00:06.210 --> 00:00:07.330
<v Matthias>the future of infrastructure.

00:00:07.690 --> 00:00:12.150
<v Matthias>My name is Matthias endler from corrode, and today we talk to Vegard Sandein

00:00:12.150 --> 00:00:15.530
<v Matthias>from KSAT about talking to satellites with Rust.

00:00:19.310 --> 00:00:23.830
<v Matthias>Vegard, can you introduce yourself and KSAT, the company you work for?

00:00:24.650 --> 00:00:29.010
<v Vegard>Thanks for having me. My name is Vegard Sandengen. I have a master's in computer

00:00:29.010 --> 00:00:33.610
<v Vegard>science, I have worked most of my professional career in the space domain,

00:00:33.970 --> 00:00:39.590
<v Vegard>even though it's usually on the ground, and I've been working at KSAT now for the last four years.

00:00:40.310 --> 00:00:46.290
<v Matthias>And recently, you became a father, so there's one more rustacean in this world. Congratulations.

00:00:47.049 --> 00:00:47.390
<v Vegard>Thank you.

00:00:47.750 --> 00:00:51.810
<v Matthias>So, can you say a few words about KSAT? I know that the slogan is,

00:00:52.130 --> 00:00:55.710
<v Matthias>connect space and Earth, and I really like that, but what is it about?

00:00:56.490 --> 00:01:01.310
<v Vegard>KSAT is the abbreviation of the company, which is... Kongsberg satellite sources.

00:01:01.750 --> 00:01:06.610
<v Vegard>So we're getting data from space to Earth, and then we're using that data.

00:01:06.890 --> 00:01:10.030
<v Vegard>So ground network operations and Earth observation networks.

00:01:10.270 --> 00:01:12.310
<v Vegard>I work in the ground network, which

00:01:12.310 --> 00:01:17.970
<v Vegard>is our distributed network of antennas situated all around the world.

00:01:18.110 --> 00:01:24.250
<v Vegard>And we enable satellite owners to talk with their satellites and get their data.

00:01:24.250 --> 00:01:32.170
<v Matthias>A lot of people only know about satellite technology from television or from popular science.

00:01:32.650 --> 00:01:38.210
<v Matthias>And the knowledge they have is probably rooted in the 60s and 70s.

00:01:38.430 --> 00:01:41.310
<v Matthias>But a lot has happened since then.

00:01:41.670 --> 00:01:44.310
<v Matthias>What has happened since the 60s?

00:01:45.110 --> 00:01:50.070
<v Vegard>Yeah, satellite industry was traditionally operated by satellite companies,

00:01:50.070 --> 00:01:53.150
<v Vegard>and they're using their software to just deliver

00:01:53.150 --> 00:01:56.530
<v Vegard>on whatever their satellite had and a satellite

00:01:56.530 --> 00:01:59.890
<v Vegard>itself started like in the 60s with the russians launching

00:01:59.890 --> 00:02:02.950
<v Vegard>Sputnik and it was very expensive i mean launching a

00:02:02.950 --> 00:02:06.290
<v Vegard>satellite took a government agency so all

00:02:06.290 --> 00:02:11.889
<v Vegard>the way until basically Space X and a lot of the other newcomers in the satellite

00:02:11.889 --> 00:02:16.410
<v Vegard>business or in the launch business came along it was extremely expensive to

00:02:16.410 --> 00:02:21.550
<v Vegard>launch satellites so it was mostly just agencies and government entities that

00:02:21.550 --> 00:02:23.970
<v Vegard>it could afford to put satellites into orbit.

00:02:24.230 --> 00:02:30.270
<v Vegard>And some of those satellites are geostationary satellites in delivering your

00:02:30.270 --> 00:02:34.850
<v Vegard>satellite communications for TV or for your sat phone, if you had that.

00:02:35.430 --> 00:02:41.250
<v Vegard>And from the old days, it was mostly communication-based, but NASA and ESA also

00:02:41.250 --> 00:02:48.830
<v Vegard>launched scientific instruments to monitor the earth or to monitor the sun or

00:02:48.830 --> 00:02:53.910
<v Vegard>to send probes into outer space to do some other readings.

00:02:55.230 --> 00:02:59.610
<v Vegard>And the way satellites communicate is almost,

00:03:00.380 --> 00:03:06.040
<v Vegard>exclusively through radio frequency communication and different wavelengths

00:03:06.040 --> 00:03:07.720
<v Vegard>on the radio frequency spectrum.

00:03:08.199 --> 00:03:15.200
<v Vegard>And the type of wavelength you use determines what's the quality of your transmission.

00:03:15.840 --> 00:03:20.740
<v Vegard>And Earth observation satellites, very close to the Earth, they orbit the Earth

00:03:20.740 --> 00:03:27.100
<v Vegard>maybe 14, and 15 times a day, they can produce a lot of data.

00:03:27.600 --> 00:03:33.580
<v Vegard>And as the instruments have gotten better, the resolution of whatever measurements

00:03:33.580 --> 00:03:35.180
<v Vegard>they're doing is getting higher.

00:03:35.560 --> 00:03:40.480
<v Vegard>The amount of data is getting higher. And the majority of the way to actually

00:03:40.480 --> 00:03:43.180
<v Vegard>get data done is to have contact with the ground station.

00:03:43.380 --> 00:03:47.460
<v Vegard>And you have a limited visibility over a ground station.

00:03:47.600 --> 00:03:53.380
<v Vegard>So you only get like 10 to 15 minutes of visibility maximum that's peak and

00:03:53.380 --> 00:03:58.220
<v Vegard>you have to push down gigabytes of data so the amount of data we're talking

00:03:58.220 --> 00:04:00.160
<v Vegard>about is ever increasing.

00:04:00.160 --> 00:04:04.840
<v Matthias>Yeah thanks for the overview but one thing i always wondered as some somewhat

00:04:04.840 --> 00:04:12.000
<v Matthias>of a bystander is what is the standardization of the communication protocols. Do

00:04:12.000 --> 00:04:17.480
<v Matthias>we keep using the same protocols since the 60s or does every satellite have

00:04:17.480 --> 00:04:19.640
<v Matthias>its own protocol or is it something in between?

00:04:20.339 --> 00:04:26.320
<v Vegard>It's everything and nothing, unfortunately. So there is a standardization body

00:04:26.320 --> 00:04:33.140
<v Vegard>called CCSDS that a lot of the government agencies contribute into from the

00:04:33.140 --> 00:04:37.320
<v Vegard>early days of the 80s-ish, if I remember correctly.

00:04:37.620 --> 00:04:45.320
<v Vegard>So a lot of the hardware-related radio frequency protocols and how to handle

00:04:45.320 --> 00:04:50.600
<v Vegard>data on the physical link has a lot of different standards.

00:04:50.980 --> 00:04:56.760
<v Vegard>And in order to push data over the air, you also need some error correction

00:04:56.760 --> 00:05:02.380
<v Vegard>and you need to be able to sequence your data just like TCP/IP,

00:05:02.820 --> 00:05:07.440
<v Vegard>there's an equivalent standard in the space industry.

00:05:09.330 --> 00:05:12.130
<v Vegard>Coming into the new space era there's a lot of

00:05:12.130 --> 00:05:15.510
<v Vegard>new contenders on the market that are software

00:05:15.510 --> 00:05:18.450
<v Vegard>companies that are using spacecrafts and

00:05:18.450 --> 00:05:22.270
<v Vegard>not spacecraft companies using software they're

00:05:22.270 --> 00:05:25.610
<v Vegard>also not really following some

00:05:25.610 --> 00:05:31.550
<v Vegard>of these standards from the agency era so you get a lot of compatibility issues

00:05:31.550 --> 00:05:35.910
<v Vegard>where you're basically having to custom fit okay how do we talk to this spacecraft

00:05:35.910 --> 00:05:42.330
<v Vegard>because This is a new software company that has just looked at the standard and said,

00:05:42.430 --> 00:05:45.190
<v Vegard>ah, we don't really need this. We'll do it our way. And it works for them.

00:05:45.510 --> 00:05:51.890
<v Vegard>But at some level, you have a minimum viable product that you can share on a radio frequency level.

00:05:51.910 --> 00:05:56.170
<v Vegard>And most people are compatible with that. But after that, all bets are off.

00:05:56.970 --> 00:06:02.710
<v Matthias>Sounds like that approach would generate a ton of legacy code in a very short time.

00:06:03.910 --> 00:06:04.390
<v Vegard>Yeah.

00:06:05.070 --> 00:06:08.930
<v Matthias>Now, let's talk about the size of operations at KSAT.

00:06:09.230 --> 00:06:15.089
<v Vegard>KSAT started off 25 years ago as a company, and we started off with one antenna

00:06:15.089 --> 00:06:18.870
<v Vegard>and one customer, and that's about it.

00:06:19.029 --> 00:06:23.610
<v Vegard>And as KSAT grew its company and this market shift into new space with all these

00:06:23.610 --> 00:06:28.830
<v Vegard>new software actors really exploded the number of satellites launched into space,

00:06:28.830 --> 00:06:34.490
<v Vegard>And KSAT followed suit and built up both their antenna park on how many antennas

00:06:34.490 --> 00:06:39.010
<v Vegard>you have and how many employees we have to deal with this and how many engineers.

00:06:39.290 --> 00:06:47.090
<v Vegard>And at this point in time, we're roughly at ballpark between 100 to 300 active antennas.

00:06:47.350 --> 00:06:50.990
<v Vegard>It is one of the biggest providers of commercial

00:06:51.790 --> 00:06:52.710
<v Vegard>ground station services.

00:06:53.150 --> 00:07:00.370
<v Matthias>I think the official website mentions 23 sites worldwide, which sounds crazy to me.

00:07:00.750 --> 00:07:05.130
<v Matthias>What is a site specifically and what goes in there to maintain that?

00:07:05.270 --> 00:07:09.650
<v Vegard>A antenna site for us is mostly, it's a place where we need a lot of power and

00:07:09.650 --> 00:07:12.430
<v Vegard>we need fiber optic cable, hopefully.

00:07:12.710 --> 00:07:17.130
<v Vegard>We don't have that at every site, but what qualifies as a good site is that

00:07:17.130 --> 00:07:22.990
<v Vegard>it's far enough apart from any other site we have, and that it covers a lot

00:07:22.990 --> 00:07:26.430
<v Vegard>of ground we don't actually get from other sites in the vicinity.

00:07:26.830 --> 00:07:31.470
<v Vegard>And the placement of the sites are usually depending a bit on what orbit the satellites go in.

00:07:31.750 --> 00:07:35.370
<v Vegard>So the satellites usually have two orbits that are relevant.

00:07:35.590 --> 00:07:39.110
<v Vegard>It's like polar orbit, where they go from pole to pole, and then it's the other

00:07:39.110 --> 00:07:40.710
<v Vegard>one where they just follow the equator.

00:07:41.330 --> 00:07:45.670
<v Vegard>And if you only have ground station at the equator and you have a polar orbiting

00:07:45.670 --> 00:07:48.690
<v Vegard>satellite, you only get the visibility twice a day.

00:07:49.150 --> 00:07:53.990
<v Vegard>But if you have a ground station near the poles, you get 10,

00:07:54.210 --> 00:07:58.770
<v Vegard>12, 14 contacts a day. So it really depends.

00:07:59.270 --> 00:08:05.250
<v Vegard>Each contact has a duration of anything between 5 to 15 minutes, really.

00:08:05.490 --> 00:08:10.270
<v Vegard>And that can generate anything from a few gigabytes to 100 gigabytes per contact.

00:08:10.790 --> 00:08:15.610
<v Matthias>Data processing can come later, but the data exchange happens during that time frame.

00:08:16.110 --> 00:08:18.890
<v Vegard>The data exchange between the satellite and the ground station, yes.

00:08:19.370 --> 00:08:24.350
<v Vegard>And because of the volume of data increasing so much, our main concern going

00:08:24.350 --> 00:08:28.010
<v Vegard>forward is not really building enough antennas, it's actually just building

00:08:28.010 --> 00:08:31.210
<v Vegard>enough infrastructure to handle all this data.

00:08:31.210 --> 00:08:35.590
<v Vegard>Because there's so much data and you need to push it around and you need to

00:08:35.590 --> 00:08:39.690
<v Vegard>provide it to the customer in a reliable fashion.

00:08:40.090 --> 00:08:48.110
<v Vegard>And it can be quite unreliable networking between a remote site in Canada or

00:08:48.110 --> 00:08:53.710
<v Vegard>in New Zealand and you have the customer on the west coast of the US.

00:08:54.410 --> 00:08:57.970
<v Vegard>That's a challenge really going forward.

00:08:57.970 --> 00:09:01.270
<v Matthias>Okay, so to summarize, the setup is a bit like this.

00:09:01.330 --> 00:09:06.730
<v Matthias>You have a ton of satellites circling the Earth on a regular basis.

00:09:07.210 --> 00:09:13.770
<v Matthias>They go around the Earth 10 to 15 times a day or so, roughly like that.

00:09:13.929 --> 00:09:18.170
<v Matthias>And then on the ground, you have antennas on ground stations.

00:09:18.510 --> 00:09:23.150
<v Matthias>And then these antennas, they connect with the satellites, do the data exchange,

00:09:23.150 --> 00:09:28.750
<v Matthias>and then you need to send the data over, say, Fiber to a central place.

00:09:30.130 --> 00:09:36.190
<v Vegard>Usually delivered straight to the customer but due to our volume of data we

00:09:36.190 --> 00:09:44.450
<v Vegard>also have to temporarily store it on the site itself but do not lose data in the process so yeah.

00:09:45.250 --> 00:09:48.670
<v Matthias>Two things come to mind first it needs

00:09:48.670 --> 00:09:55.210
<v Matthias>to be extremely reliable because if you lose the data that is big outage and

00:09:55.210 --> 00:10:00.549
<v Matthias>probably a loss to the customer as well and the second part is how often can

00:10:00.549 --> 00:10:06.090
<v Matthias>you make changes to that code how often can you modify the code that also needs to be reliable,

00:10:07.030 --> 00:10:11.450
<v Matthias>i'm guessing you probably even have limitations as to how often you can access

00:10:11.450 --> 00:10:13.429
<v Matthias>these ground stations and make changes.

00:10:13.429 --> 00:10:16.410
<v Vegard>Yes that is correct that that

00:10:16.410 --> 00:10:19.070
<v Vegard>has to be really reliable but you're actually right on

00:10:19.070 --> 00:10:22.210
<v Vegard>point with how do we update the code because it's not

00:10:22.210 --> 00:10:28.070
<v Vegard>that we're using the antenna 100 of the time but the ecosystem around the antenna

00:10:28.070 --> 00:10:33.510
<v Vegard>with our software running in different hardware close to the antenna it is not

00:10:33.510 --> 00:10:37.650
<v Vegard>easy to access i don't have access to it for instance but so i just have to

00:10:37.650 --> 00:10:41.309
<v Vegard>push code and hope that someone else deploys it worst case it can take,

00:10:42.250 --> 00:10:46.429
<v Vegard>weeks before something is deployed worldwide that

00:10:46.429 --> 00:10:49.290
<v Vegard>is a process we're obviously trying to optimize and get better at

00:10:49.290 --> 00:10:52.370
<v Vegard>but it's it is a pain point because it's also in

00:10:52.370 --> 00:10:56.790
<v Vegard>inaccessible sites and the most inaccessible sites we have is probably in our

00:10:56.790 --> 00:11:02.010
<v Vegard>antarctica and troll station that also doesn't have fiber optic cable so anything

00:11:02.010 --> 00:11:06.070
<v Vegard>that you put down there we also have to beam up to a geostationary sites satellites

00:11:06.070 --> 00:11:09.750
<v Vegard>so we can beam it down to earth again a place where we have fiber.

00:11:10.530 --> 00:11:14.630
<v Matthias>I guess the huge advantage here is that for code that is written in Rust,

00:11:14.690 --> 00:11:20.710
<v Matthias>you could just deploy a static binary and people would just be able to run it on the deploy target.

00:11:21.450 --> 00:11:27.570
<v Vegard>It's generally that easy. I mean, everything we do nowadays is dockerized.

00:11:27.750 --> 00:11:31.230
<v Vegard>On all our ground stations, we're running some variant of Kubernetes and just

00:11:31.230 --> 00:11:34.470
<v Vegard>running it on OpenStack or Kubernetes directly.

00:11:34.770 --> 00:11:40.929
<v Matthias>One could think that since you operate in the ground station and you probably

00:11:40.929 --> 00:11:45.530
<v Matthias>have access to a rack or so, you're not resource-constrained.

00:11:45.750 --> 00:11:50.450
<v Matthias>But one thing that people might forget is that you don't do constant updates

00:11:50.450 --> 00:11:51.730
<v Matthias>to the hardware over there.

00:11:52.049 --> 00:11:55.549
<v Vegard>We're definitely resource-constrained on a lot of our sites.

00:11:55.770 --> 00:12:00.090
<v Vegard>Not all of them, but a lot of them. It can take us eight months to get a new

00:12:00.090 --> 00:12:05.610
<v Vegard>computer just ordered from our vendor, and then we have to ship it to anywhere

00:12:05.610 --> 00:12:09.030
<v Vegard>in the world, and you have to get people there on-site to install it.

00:12:09.030 --> 00:12:13.110
<v Vegard>So we are resource constrained in the fact that we don't want to over-provision

00:12:13.110 --> 00:12:19.190
<v Vegard>every data center around the world near to all our antennas on our ground station sites.

00:12:19.390 --> 00:12:23.590
<v Vegard>Because, first of all, we don't necessarily have the resources to do that,

00:12:23.610 --> 00:12:26.670
<v Vegard>and we don't have the ability to do it at some point.

00:12:26.830 --> 00:12:30.429
<v Vegard>So it's nice to use something that doesn't hog all the resources.

00:12:31.110 --> 00:12:32.790
<v Matthias>Wouldn't it then be super easy

00:12:32.790 --> 00:12:38.110
<v Matthias>to fall into a trap of being extremely conservative about tech decisions?

00:12:38.110 --> 00:12:44.390
<v Matthias>People might associate space technology with a lot of very old conservative

00:12:44.390 --> 00:12:49.530
<v Matthias>technology, and maybe for a good reason, because it's tried and tested.

00:12:50.490 --> 00:12:54.250
<v Vegard>I think the satellite industry or space industry is definitely very conservative.

00:12:54.470 --> 00:13:00.050
<v Vegard>It takes a lot of effort to qualify something to run in space. I know,

00:13:01.160 --> 00:13:05.100
<v Vegard>RustConf, last year there was one of the sponsors was K2 Space.

00:13:05.420 --> 00:13:11.000
<v Vegard>They're actually a space company that with a lot of recruits from former AWS

00:13:11.000 --> 00:13:16.800
<v Vegard>and SpaceX, they wanted to do everything in Rust. They wanted to build the satellite.

00:13:17.040 --> 00:13:20.100
<v Vegard>They wanted to build the firmware. They wanted to build all the ground resources.

00:13:20.660 --> 00:13:25.020
<v Vegard>100% of Rust. They had a lightning talk at RustConf. It's probably out on YouTube.

00:13:25.380 --> 00:13:29.560
<v Vegard>So there are definitely contenders out there that want to not be so conservative.

00:13:29.560 --> 00:13:34.060
<v Vegard>But from the old space they are very conservative but i wouldn't say that's

00:13:34.060 --> 00:13:39.120
<v Vegard>necessarily true on the ground the ground is a bit more like we can touch this

00:13:39.120 --> 00:13:42.780
<v Vegard>we can fix it it's not the same in space earlier.

00:13:42.780 --> 00:13:45.740
<v Matthias>You said there was a shift in the industry so

00:13:45.740 --> 00:13:48.640
<v Matthias>we moved from space companies using software to

00:13:48.640 --> 00:13:52.660
<v Matthias>software companies doing space things mostly two

00:13:52.660 --> 00:13:57.080
<v Matthias>companies come to mind right away one would be spacex and the other one would

00:13:57.080 --> 00:14:03.840
<v Matthias>be blue origin but i'm assuming that's just a tiny little slice of the picture

00:14:03.840 --> 00:14:08.300
<v Matthias>and maybe there are other software companies that i might have heard of that

00:14:08.300 --> 00:14:12.260
<v Matthias>pushed into the into the space.

00:14:12.260 --> 00:14:15.460
<v Vegard>Into the space yes you also have a few other providers that

00:14:15.460 --> 00:14:18.560
<v Vegard>are up there trying to and successfully doing

00:14:18.560 --> 00:14:21.820
<v Vegard>so like rocket lab but from but these

00:14:21.820 --> 00:14:25.220
<v Vegard>are launch providers they're facilitating the software

00:14:25.220 --> 00:14:28.180
<v Vegard>companies to launch something into space but otherwise

00:14:28.180 --> 00:14:31.180
<v Vegard>aws is actually going right

00:14:31.180 --> 00:14:35.260
<v Vegard>at it and they're going after the data primarily they want the data because

00:14:35.260 --> 00:14:41.040
<v Vegard>that's aws's business model is data and there's a lot of data in space a couple

00:14:41.040 --> 00:14:46.060
<v Vegard>of years back or three or four years back they launched a ground station service

00:14:46.060 --> 00:14:51.140
<v Vegard>which is i I wouldn't say a direct competitor to us, but they are definitely a competitor.

00:14:51.380 --> 00:14:55.920
<v Vegard>And we have made a strategic partnership with AWS to be,

00:14:56.840 --> 00:15:02.020
<v Vegard>a ground network of network providers. So people can come to us and they can

00:15:02.020 --> 00:15:08.820
<v Vegard>use the resources in AWS, their antennas, their setup, but they can do it through us.

00:15:08.900 --> 00:15:13.160
<v Vegard>But the business model is a bit different because AWS, as I said, they're a data company.

00:15:13.260 --> 00:15:16.480
<v Vegard>They really just care about getting their data into the AWS data center.

00:15:16.680 --> 00:15:18.720
<v Vegard>So you can do whatever you want with it there.

00:15:19.300 --> 00:15:22.880
<v Vegard>So the space part is just a means to an end, really.

00:15:22.880 --> 00:15:28.420
<v Matthias>So we move from space exploration to data exploration.

00:15:28.860 --> 00:15:33.400
<v Matthias>What has changed on the language side? How did the story go at KSAT?

00:15:34.620 --> 00:15:40.920
<v Vegard>Initially, everything was engineers writing Perl scripts and just making it work.

00:15:41.080 --> 00:15:44.460
<v Vegard>And that has scaled very well, but it's still written in Perl,

00:15:44.540 --> 00:15:45.860
<v Vegard>and it's not the newest version.

00:15:46.060 --> 00:15:50.660
<v Vegard>And at some point, we needed to have a bit more control of whatever is running

00:15:50.660 --> 00:15:56.760
<v Vegard>on our antennas. And that was developed in Java in the mid-2000s with an Oracle

00:15:56.760 --> 00:15:58.820
<v Vegard>database. And that has scaled well.

00:15:59.320 --> 00:16:03.680
<v Vegard>We're very thankful for the legacy that was provided to us so that we can even

00:16:03.680 --> 00:16:06.120
<v Vegard>be here today to do something else at a bigger scale.

00:16:06.340 --> 00:16:10.100
<v Vegard>Because that would not be possible without the humble beginnings.

00:16:12.160 --> 00:16:18.680
<v Matthias>The 2000s were definitely the time of Java. It has some really nice traits,

00:16:18.680 --> 00:16:23.760
<v Matthias>and I think it resonated well with the challenges of its time.

00:16:24.100 --> 00:16:28.440
<v Matthias>But then what happened in the 2010s at KSAT?

00:16:28.900 --> 00:16:34.300
<v Vegard>Yeah, so at some point, we're kind of scaled up with a bit more developers and

00:16:34.300 --> 00:16:38.320
<v Vegard>with a bit more modern scripting and kind of Python took over.

00:16:38.520 --> 00:16:42.300
<v Vegard>We have multiple Python applications still in production today from that era.

00:16:42.300 --> 00:16:47.940
<v Vegard>But yeah, we started to see that due to how that Java application,

00:16:48.240 --> 00:16:53.740
<v Vegard>and not necessarily Java in itself, but just the database and all the Perl integrations

00:16:53.740 --> 00:16:57.560
<v Vegard>that unfortunately had direct database access,

00:16:57.920 --> 00:17:02.400
<v Vegard>meant that we had a distributed network all over the world with scripts being

00:17:02.400 --> 00:17:06.000
<v Vegard>able to access the raw contents of our database.

00:17:06.000 --> 00:17:09.720
<v Vegard>And that was not very scalable. We launched

00:17:09.720 --> 00:17:12.820
<v Vegard>an initiative to move away from this world into

00:17:12.820 --> 00:17:15.800
<v Vegard>a more modern world where we can have more

00:17:15.800 --> 00:17:18.740
<v Vegard>control over the life cycle of the data that

00:17:18.740 --> 00:17:25.380
<v Vegard>we put in the database. 20-25 years ago everything was on ftp xml drop boxes

00:17:25.380 --> 00:17:29.340
<v Vegard>you can call that an api as well but we've decided that we can try to offload

00:17:29.340 --> 00:17:35.099
<v Vegard>responsibility into like segregated new Postgres database where access to the

00:17:35.099 --> 00:17:38.240
<v Vegard>data is tightly controlled through an HTTP API.

00:17:38.740 --> 00:17:42.560
<v Vegard>Yeah, so we're employing a sort of a strangle pattern on that and just trying

00:17:42.560 --> 00:17:49.300
<v Vegard>to just grope in any responsibilities and kind of rewrite and repurpose it and

00:17:49.300 --> 00:17:51.700
<v Vegard>have successfully launched a.

00:17:52.430 --> 00:17:58.490
<v Vegard>Competing solution in-house now where half of the antennas are on the old system

00:17:58.490 --> 00:18:07.270
<v Vegard>the old api was written in pearl and it was strangled on the hftp layer into a coffin application,

00:18:07.869 --> 00:18:13.070
<v Vegard>and then we nipped at it and moved different responsibilities and endpoints around and now,

00:18:13.710 --> 00:18:19.450
<v Vegard>i would say from a responsibility point of view where like it's 40 60 in rust

00:18:19.450 --> 00:18:25.330
<v Vegard>right now but a lot of the boring parts are in the kotlin application but we're

00:18:25.330 --> 00:18:32.190
<v Vegard>actively working to to migrate the remaining kotlin portions as well over trust earlier.

00:18:32.190 --> 00:18:35.130
<v Matthias>You mentioned the strangler pattern how does it work.

00:18:35.130 --> 00:18:38.090
<v Vegard>So strangler pattern is very

00:18:38.090 --> 00:18:41.470
<v Vegard>convenient when you have a code base

00:18:41.470 --> 00:18:44.670
<v Vegard>or a interface layer where you

00:18:44.670 --> 00:18:47.950
<v Vegard>can very well design know what's

00:18:47.950 --> 00:18:50.710
<v Vegard>going in and what's going out and you know that everything below

00:18:50.710 --> 00:18:53.470
<v Vegard>this is just complete mess and you don't

00:18:53.470 --> 00:18:56.970
<v Vegard>understand anything but you understand the interfaces or

00:18:56.970 --> 00:19:00.550
<v Vegard>the boundaries and you can replace the implementation

00:19:00.550 --> 00:19:03.930
<v Vegard>under each boundary with very

00:19:03.930 --> 00:19:09.470
<v Vegard>great control and see the differences in implementation and behavior and you

00:19:09.470 --> 00:19:14.470
<v Vegard>make it entirely seamless to all consumers that you have actually done anything

00:19:14.470 --> 00:19:19.010
<v Vegard>which is very nice but it It requires that you have some sort of abstractions

00:19:19.010 --> 00:19:20.790
<v Vegard>that actually make this feasible.

00:19:21.250 --> 00:19:28.250
<v Vegard>And from an HTTP API layer, very easy, because the contract is in how the API

00:19:28.250 --> 00:19:30.310
<v Vegard>responds or what parameter it takes.

00:19:30.690 --> 00:19:33.150
<v Vegard>And you can replace that in any language.

00:19:33.570 --> 00:19:38.109
<v Vegard>It's not really that hard. It's just a lot of verification that you've actually

00:19:38.109 --> 00:19:40.330
<v Vegard>replicated all the behavior.

00:19:41.619 --> 00:19:44.740
<v Matthias>Now, let's focus on the API for a second.

00:19:45.440 --> 00:19:49.560
<v Matthias>You mentioned that it's a split between Kotlin and Rust at the moment.

00:19:50.060 --> 00:19:51.780
<v Matthias>Where do you draw the line?

00:19:52.099 --> 00:19:57.660
<v Vegard>I don't think there's a natural split now, other than whatever developer or

00:19:57.660 --> 00:20:00.880
<v Vegard>team took that responsibility and what they were comfortable with.

00:20:01.260 --> 00:20:06.460
<v Vegard>So we've had a very open policy at KSAT on what languages we would use to solve

00:20:06.460 --> 00:20:12.340
<v Vegard>whatever problem. And it has definitely been a pushback to introduce Rust in

00:20:12.340 --> 00:20:16.160
<v Vegard>some capacity by some team members in different teams.

00:20:16.359 --> 00:20:21.359
<v Vegard>And I'm not necessarily sure all of their concerns are, I would call,

00:20:21.460 --> 00:20:23.080
<v Vegard>valid, but there are definitely concerns.

00:20:23.640 --> 00:20:29.140
<v Vegard>And some of the pushback I've heard is usually it's not mature enough or the ecosystem is not there.

00:20:29.140 --> 00:20:36.180
<v Vegard>And I feel that is a sentiment that is often held about Rust that I'm not necessarily

00:20:36.180 --> 00:20:40.400
<v Vegard>sure is true anymore, because I feel the ecosystem is very much present.

00:20:40.619 --> 00:20:43.160
<v Vegard>I can do everything I want in the ecosystem in Rust today.

00:20:43.520 --> 00:20:48.580
<v Vegard>And the other part is maybe just a lack of knowledge of how do you use such

00:20:48.580 --> 00:20:51.720
<v Vegard>complex terms, because it comes from a system background.

00:20:52.040 --> 00:20:56.840
<v Vegard>And a lot of regarding boroughs and lifetimes and stuff like that,

00:20:56.880 --> 00:20:58.760
<v Vegard>it can seem a bit intimidating.

00:20:59.140 --> 00:21:05.340
<v Vegard>To someone that's usually just very happy in their Java or .NET environment

00:21:05.340 --> 00:21:11.240
<v Vegard>where that is not necessarily a concern for 99% of what they're doing.

00:21:11.400 --> 00:21:13.940
<v Vegard>There are also positive receptions of Rust.

00:21:14.280 --> 00:21:20.240
<v Vegard>And I have personally been able to, I don't know, convert a couple of teams to use Rust.

00:21:20.540 --> 00:21:24.460
<v Vegard>So yeah, we're approximately three or four teams now using Rust in production

00:21:24.460 --> 00:21:30.320
<v Vegard>at KSAT with maybe four-ish people in each team that's actively writing Rust.

00:21:30.700 --> 00:21:35.960
<v Matthias>How does that usually go for you when you approach a team and they are curious

00:21:35.960 --> 00:21:38.380
<v Matthias>about Rust, but they are not entirely convinced yet?

00:21:39.930 --> 00:21:44.670
<v Vegard>The conversation often goes in the direction of this is what is very good about

00:21:44.670 --> 00:21:47.330
<v Vegard>Rust, and that's what I start with.

00:21:47.510 --> 00:21:52.450
<v Vegard>And you have to make some concessions. And the concessions are obviously just, is it a good team fit?

00:21:52.810 --> 00:21:58.650
<v Vegard>Because I don't think Rust is hard to use once you've gotten over that initial,

00:21:58.930 --> 00:22:01.270
<v Vegard>whoa, what happened here? It's a shock.

00:22:01.790 --> 00:22:06.869
<v Vegard>But a lot of teams have their experiences in their toolboxes in other languages

00:22:06.869 --> 00:22:07.950
<v Vegard>and know how to solve them.

00:22:07.950 --> 00:22:10.950
<v Vegard>And if you don't really have a champion on that team itself,

00:22:11.290 --> 00:22:16.330
<v Vegard>I don't think it's possible to really introduce Rust into a team because the

00:22:16.330 --> 00:22:17.710
<v Vegard>team has to embrace it themselves.

00:22:18.010 --> 00:22:22.390
<v Vegard>That it's a no-go if the team is not championed from within, really.

00:22:22.390 --> 00:22:30.130
<v Vegard>So my job is more just like I try to do some good mentoring and try to have

00:22:30.130 --> 00:22:35.070
<v Vegard>some common guidelines and try to curate some crates and make some internal

00:22:35.070 --> 00:22:39.790
<v Vegard>crates that help the process along internally with the tooling and the way we do things.

00:22:40.010 --> 00:22:45.070
<v Vegard>But ultimately, you require that team champion as well to be on your team.

00:22:46.050 --> 00:22:50.230
<v Matthias>What's your success rate here? Have you lost some of these battles?

00:22:51.410 --> 00:22:55.550
<v Vegard>Not on a team level, but maybe on an individual level, yes.

00:22:56.690 --> 00:23:02.710
<v Vegard>But the general vibe is that it's going more and more into us for a lot of our distributed systems.

00:23:02.950 --> 00:23:07.170
<v Vegard>And just because it's so nice to use once you actually get to know it.

00:23:07.730 --> 00:23:11.730
<v Vegard>So it's just that hurdle of inviting people in that haven't used it before.

00:23:12.960 --> 00:23:18.140
<v Matthias>I'm almost too afraid to ask it, but has Go ever come up in that conversation?

00:23:19.000 --> 00:23:23.099
<v Vegard>Go has come up multiple times, and we have production code in Go as well.

00:23:23.340 --> 00:23:28.520
<v Vegard>I'm a bit annoyed at that sentiment as well, because Go is maybe annoyed is

00:23:28.520 --> 00:23:33.859
<v Vegard>not the right word, and I'm a bit intrigued by the why don't we just do it in Go?

00:23:34.040 --> 00:23:38.220
<v Vegard>Because Go was released in March 2012.

00:23:38.840 --> 00:23:43.520
<v Vegard>It's three years older than Rust. at this point is 10 and 13.

00:23:43.820 --> 00:23:45.200
<v Vegard>It's not that big and much of a difference.

00:23:45.460 --> 00:23:51.760
<v Matthias>In terms of age, but in terms of functionality and in terms of developer ergonomics, maybe?

00:23:52.740 --> 00:23:58.440
<v Vegard>Yeah, but Go had a very simple language to begin with. So it was very easy to get going with Go.

00:23:58.820 --> 00:24:01.920
<v Vegard>But I also think that there is an ecosystem in Go,

00:24:02.119 --> 00:24:07.160
<v Vegard>but the ecosystem is harder to engage with than it is the Rust ecosystem because

00:24:07.160 --> 00:24:14.460
<v Vegard>the tooling and with cargo on the kin as just miles above any tooling you have in Go.

00:24:15.020 --> 00:24:19.060
<v Vegard>So that makes it, for me, also a no-brainer just because, like,

00:24:19.660 --> 00:24:23.500
<v Vegard>disregarding just the language itself and the features and ergonomics of the

00:24:23.500 --> 00:24:29.440
<v Vegard>language, just the tooling and the ecosystem with using the language is what

00:24:29.440 --> 00:24:33.520
<v Vegard>makes Rust the number one contender on the market.

00:24:34.460 --> 00:24:42.180
<v Matthias>Go is very much a day-one language. and starting a project and getting to your

00:24:42.180 --> 00:24:47.880
<v Matthias>first production version is usually very ergonomic, very quick, very elegant.

00:24:48.720 --> 00:24:52.840
<v Matthias>The problems start to arise on day two. Not exactly day two,

00:24:53.000 --> 00:25:00.780
<v Matthias>but when you have a larger code base, you feel the limitations of the language of the ecosystem.

00:25:00.780 --> 00:25:06.940
<v Matthias>It's trying to constrain you somehow. almost feels like it's strangling you.

00:25:08.130 --> 00:25:09.390
<v Vegard>And you're not strangling it.

00:25:10.290 --> 00:25:14.710
<v Matthias>I probably would have made the same decision in your position, of course.

00:25:15.530 --> 00:25:20.270
<v Matthias>Obviously, I'm biased, but you have to maintain this software for a very long time.

00:25:21.190 --> 00:25:26.110
<v Vegard>Yeah, so definitely, from my experience point of view, just being able to model

00:25:26.110 --> 00:25:29.810
<v Vegard>your code in a way that just, it just feels,

00:25:30.270 --> 00:25:34.950
<v Vegard>you just know where the boundaries of what you've made in the main And it's

00:25:34.950 --> 00:25:39.310
<v Vegard>very easy to move that along and refactor it.

00:25:39.750 --> 00:25:47.990
<v Vegard>So back in eons ago, I was a C and C++ developer, and I did a bit of that and a bit of that.

00:25:48.490 --> 00:25:53.510
<v Vegard>And just trying to refactor a C++ code base and having confidence that you've

00:25:53.510 --> 00:25:56.350
<v Vegard>actually done it correctly, I have never had that.

00:25:56.570 --> 00:26:01.490
<v Vegard>But Rust, if it compiles it works, it basically is that.

00:26:01.490 --> 00:26:04.350
<v Vegard>And that sentiment is overused i think

00:26:04.350 --> 00:26:07.990
<v Vegard>but it still feels very true at some point because the

00:26:07.990 --> 00:26:12.850
<v Vegard>compiler is so powerful but whenever it compiles i'm confident and i also have

00:26:12.850 --> 00:26:18.970
<v Vegard>a few tests here and there and where the tests run as well which they do 99

00:26:18.970 --> 00:26:25.230
<v Vegard>percent of the time after i've done a major refactor i'm confident i will push no problem funny.

00:26:25.230 --> 00:26:29.370
<v Matthias>That you say you have a few tests here and there does that mean you lean into

00:26:29.370 --> 00:26:32.490
<v Matthias>Rust's strong type system a lot as well.

00:26:32.610 --> 00:26:36.370
<v Matthias>And maybe you don't have to write that many tests that you would have to write

00:26:36.370 --> 00:26:39.230
<v Matthias>in other languages, more dynamic languages like Python.

00:26:39.650 --> 00:26:44.770
<v Vegard>Oh, definitely. Our tests is, I think there's a concept called like a diamond-shaped

00:26:44.770 --> 00:26:48.710
<v Vegard>testing or something where you basically, you have very few unit tests,

00:26:48.830 --> 00:26:51.590
<v Vegard>you have very few system tests, but you have a lot of integration tests.

00:26:51.790 --> 00:26:57.810
<v Vegard>And those integration tests are placed on the boundaries of the network layer, so HTTP.

00:26:57.810 --> 00:27:00.750
<v Vegard>And I have all my

00:27:00.750 --> 00:27:03.950
<v Vegard>tests are basically just HTTP related API tests

00:27:03.950 --> 00:27:08.410
<v Vegard>because I don't really care how the structs

00:27:08.410 --> 00:27:13.510
<v Vegard>or functionality within the Rust code base behaves because what's important

00:27:13.510 --> 00:27:20.430
<v Vegard>is just what is the contract or the HTTP boundaries so we have a few tests down

00:27:20.430 --> 00:27:27.550
<v Vegard>to the database over the HTTP layer but from unit test point of view almost nothing Thanks.

00:27:28.210 --> 00:27:34.870
<v Matthias>But in order for that to work, you would have to lean very heavily into the

00:27:34.870 --> 00:27:40.910
<v Matthias>Rust mechanics, into the type system, and you would have to rely on it.

00:27:41.410 --> 00:27:47.470
<v Matthias>Are there patterns that you commonly use to fully embrace that part of Rust?

00:27:48.030 --> 00:27:51.430
<v Vegard>Yeah, so I use quite a lot of new types.

00:27:51.730 --> 00:27:57.650
<v Vegard>For instance, a UID, I will new type it into a variant that represents this resource.

00:27:58.210 --> 00:28:02.470
<v Vegard>Meaning that the API layer is very communicative of what it's actually expecting,

00:28:03.150 --> 00:28:06.030
<v Vegard>or not the API layer, but the code base itself that serves.

00:28:06.510 --> 00:28:12.330
<v Vegard>So it's very easy to modularize different components that work in some form

00:28:12.330 --> 00:28:17.090
<v Vegard>of hierarchy because the types are so strong that you can convey so much with

00:28:17.090 --> 00:28:21.010
<v Vegard>both the primitive types themselves, but also some types in form of enums.

00:28:21.210 --> 00:28:25.830
<v Vegard>The one thing I miss every time I go to any other language is just the enum.

00:28:25.830 --> 00:28:32.110
<v Vegard>I think this I could model very well in an enum, and I don't have this capability. And it saddens me.

00:28:33.150 --> 00:28:38.470
<v Matthias>Do you have an example for an enum that, for example, comes to mind where modeling

00:28:38.470 --> 00:28:41.970
<v Matthias>some certain business logic was very ergonomic?

00:28:43.090 --> 00:28:46.310
<v Vegard>So I'm a big fan of the one-off pattern.

00:28:46.590 --> 00:28:49.810
<v Vegard>It is represented in, for instance, OpenAPI definitions.

00:28:50.050 --> 00:28:52.950
<v Vegard>There is like a one-off you can represent there. doing code

00:28:52.950 --> 00:28:55.670
<v Vegard>gen for one-offs in open api to any other

00:28:55.670 --> 00:28:59.070
<v Vegard>language is horrible but code

00:28:59.070 --> 00:29:02.250
<v Vegard>gen 2 was very easy to use and being

00:29:02.250 --> 00:29:06.710
<v Vegard>able to represent the fact that this resource has different

00:29:06.710 --> 00:29:13.230
<v Vegard>properties depending on which kind it is is very powerful because even though

00:29:13.230 --> 00:29:17.730
<v Vegard>at some level you're talking about this resource it has one resource id but

00:29:17.730 --> 00:29:23.450
<v Vegard>it can manifest itself as different forms of different versions or represent

00:29:23.450 --> 00:29:26.090
<v Vegard>different physical attributes on the network.

00:29:26.750 --> 00:29:31.930
<v Vegard>And on some abstractions, you don't really care about those properties, but on others you do.

00:29:32.050 --> 00:29:36.710
<v Vegard>It's very nice to be able to represent just the exact properties that are present

00:29:36.710 --> 00:29:42.990
<v Vegard>and not load of optionals that are present only is this is true and this is true.

00:29:43.190 --> 00:29:47.970
<v Vegard>And you have to carry that logic throughout the code, that makes it harder to refactor as well.

00:29:48.170 --> 00:29:52.810
<v Vegard>If you know that this can only be set if this other value is set,

00:29:53.050 --> 00:29:57.190
<v Vegard>and that's invariance in your code that you kind of encode with the enums instead.

00:29:57.710 --> 00:30:02.910
<v Matthias>I'm not too familiar with it, but I know that in a schema you can say this is

00:30:02.910 --> 00:30:08.190
<v Matthias>one of these variants, one of these kinds, and I guess it maps really well to enums.

00:30:08.610 --> 00:30:14.070
<v Matthias>If you go further one step, you're probably also using the serde ecosystem of things

00:30:14.070 --> 00:30:18.910
<v Matthias>and say this is my input type and so I convert it from the schema.

00:30:20.450 --> 00:30:24.730
<v Vegard>So we're leaning heavily into serde. It's an excellent library.

00:30:25.490 --> 00:30:29.090
<v Matthias>Any other crates that you personally like for that sort of work?

00:30:29.750 --> 00:30:34.930
<v Vegard>You usually have to do some customizations on top of serde with serde with or

00:30:34.930 --> 00:30:38.950
<v Vegard>stuff like that to actually do the proper transformations.

00:30:39.750 --> 00:30:44.950
<v Vegard>I've been also experimenting now with Utopia to generate OpenAPI specifications.

00:30:45.110 --> 00:30:49.970
<v Matthias>It's called UTO IPA. It's a very common misspelling, unfortunately.

00:30:50.450 --> 00:30:53.470
<v Matthias>I made it a dozen times until someone pointed it out.

00:30:53.690 --> 00:30:56.150
<v Vegard>Yeah, I will probably continue to misspell it.

00:30:56.330 --> 00:31:02.830
<v Matthias>The reason why it's called UTOIPA, by the way, is IPA is API backwards.

00:31:03.130 --> 00:31:06.250
<v Matthias>And it's also a good beer. That's from the READMEs.

00:31:06.410 --> 00:31:10.110
<v Vegard>Of course. Sorry. Yeah.

00:31:10.730 --> 00:31:14.950
<v Vegard>One slight issue I have with serde is that it's very versatile,

00:31:14.950 --> 00:31:20.590
<v Vegard>well, but it doesn't really give you that great of a structured way of accessing errors.

00:31:21.150 --> 00:31:27.950
<v Vegard>And that boggles me a bit because I really want to give good structured feedback in our API surfaces.

00:31:28.530 --> 00:31:33.530
<v Vegard>And I don't want to fork Siri to just fix that because then I'm incompatible with everything.

00:31:33.790 --> 00:31:37.550
<v Vegard>I'm not entirely sure how to solve that on an ecosystem level.

00:31:37.750 --> 00:31:42.450
<v Vegard>But right now, I've just wrapped the outputs and parsed the strings to extract

00:31:42.450 --> 00:31:44.850
<v Vegard>the vital information that I want.

00:31:44.950 --> 00:31:50.850
<v Vegard>But I would definitely like to see a bit more structured error responses on

00:31:50.850 --> 00:31:53.250
<v Vegard>what went wrong in the serialization process.

00:31:53.650 --> 00:31:56.710
<v Matthias>I personally see serde more of a contract.

00:31:57.270 --> 00:32:01.130
<v Matthias>You have the value type, you have to serialize, you have these traits,

00:32:01.350 --> 00:32:02.310
<v Matthias>that's your building block.

00:32:02.490 --> 00:32:09.050
<v Matthias>So what keeps you from building structured error messages from these smaller building blocks?

00:32:10.260 --> 00:32:17.220
<v Vegard>Because the serde error type doesn't give you, like the serde error type, it is,

00:32:17.720 --> 00:32:22.080
<v Vegard>well, I think it's possible, but it is, we're using the JSON, serde-json,

00:32:22.400 --> 00:32:28.940
<v Vegard>because it's what we communicate over, and the serde-json error type eradicates

00:32:28.940 --> 00:32:34.200
<v Vegard>any references to which field, for instance, was the error at.

00:32:34.200 --> 00:32:39.780
<v Vegard>So you will have to parse the stringified message to extract it was at this

00:32:39.780 --> 00:32:44.060
<v Vegard>field to get it out or you have to fork serde-json and fix it there.

00:32:44.220 --> 00:32:49.360
<v Vegard>I could probably do that as well but I've seen it in multiple JSON parsing libraries

00:32:49.360 --> 00:32:56.640
<v Vegard>as well that the level of programmatic access to the variants are not that great.

00:32:57.080 --> 00:33:01.240
<v Vegard>But other than that, the serde ecosystem is amazing. You can do a lot of stuff with it.

00:33:01.240 --> 00:33:07.140
<v Vegard>Just have to be a bit more forgiving on how you output the errors to the end

00:33:07.140 --> 00:33:10.660
<v Vegard>user because that's kind of what matters here i mean for me as a programmer

00:33:10.660 --> 00:33:15.040
<v Vegard>i don't really care but it's not the consumer of the api that cares from.

00:33:15.040 --> 00:33:20.740
<v Matthias>What i can tell from our conversation so far stability is the main focus.

00:33:20.740 --> 00:33:24.000
<v Vegard>From listening to a lot of your other guests

00:33:24.000 --> 00:33:27.180
<v Vegard>on this podcast doing a lot of cool shit and they're

00:33:27.180 --> 00:33:33.880
<v Vegard>it's it's very fun to listen to but and i get the feeling that our first usage

00:33:33.880 --> 00:33:41.100
<v Vegard>is boring we're just using the top level just web frameworks and sqlx and axum

00:33:41.100 --> 00:33:47.020
<v Vegard>and serde and just putting it all together and just making it work. I have a good example of that because,

00:33:47.700 --> 00:33:52.280
<v Vegard>a couple of months back we needed to do some changes in a few of our services

00:33:52.280 --> 00:33:56.520
<v Vegard>running and I went into the repository for that service to actually fix it,

00:33:56.640 --> 00:34:01.260
<v Vegard>and I saw the last commit was one and a half year ago, and it's just been running. One and a half years.

00:34:01.480 --> 00:34:05.060
<v Vegard>I haven't touched it, and I have never had that experience in my professional career.

00:34:05.580 --> 00:34:11.520
<v Vegard>That service was the main authentication authorization service that authenticated

00:34:11.520 --> 00:34:16.280
<v Vegard>and managed every API key and principle, so it was used on every request.

00:34:16.860 --> 00:34:22.380
<v Vegard>It's really chugging along, so it's amazing. I've had only good experiences on that front end.

00:34:23.860 --> 00:34:27.080
<v Matthias>Did you also have any bad experiences with Rust?

00:34:28.520 --> 00:34:32.180
<v Vegard>You can call it a bad experience, but I would camouflage it as a good experience.

00:34:32.500 --> 00:34:38.520
<v Vegard>So we've been running on-prem coaster for many years, and that on-prem coaster

00:34:38.520 --> 00:34:40.960
<v Vegard>hasn't really gotten that much love and attention.

00:34:41.320 --> 00:34:45.960
<v Vegard>So it's just chugging it on with the resources it had six years ago when it was installed.

00:34:46.620 --> 00:34:50.980
<v Vegard>We also do a lot of calculations regarding satellite trajectories and visibilities

00:34:50.980 --> 00:34:52.580
<v Vegard>to our run stations and stuff like that.

00:34:52.580 --> 00:34:55.600
<v Vegard>So one of the things i wanted to calculate was just

00:34:55.600 --> 00:34:58.860
<v Vegard>okay when is a satellite visible over

00:34:58.860 --> 00:35:02.820
<v Vegard>our ground stations and we support quite

00:35:02.820 --> 00:35:05.739
<v Vegard>a lot of satellites and we have a lot of ground stations so there's a

00:35:05.739 --> 00:35:08.680
<v Vegard>lot of maths to figure out when are you where

00:35:08.680 --> 00:35:12.300
<v Vegard>and when can i talk to you and i naively

00:35:12.300 --> 00:35:15.000
<v Vegard>just put everything in a loop and then

00:35:15.000 --> 00:35:18.040
<v Vegard>i slammed rayon on it and i pushed

00:35:18.040 --> 00:35:21.140
<v Vegard>it to production and a couple of days later one of

00:35:21.140 --> 00:35:25.800
<v Vegard>my devops team came and just like our production cluster is like running at

00:35:25.800 --> 00:35:33.800
<v Vegard>80% cpu it's struggling a bit also and it's majority from the service i just

00:35:33.800 --> 00:35:38.440
<v Vegard>updated and yeah the computations work fine but it had a wider impact on our

00:35:38.440 --> 00:35:39.560
<v Vegard>other production services,

00:35:40.570 --> 00:35:42.060
<v Matthias>So it's too performant

00:35:42.070 --> 00:35:45.840
<v Vegard>Too good i had to dial that back.

00:35:48.420 --> 00:35:55.739
<v Matthias>Okay i can see how that might also be a benefit or how i could see it as a win

00:35:55.739 --> 00:36:01.800
<v Matthias>but are there any other issues with the wider rust ecosystem that come to mind.

00:36:01.800 --> 00:36:06.680
<v Vegard>Yeah i mean we're a big user of async because we're using axum and everything

00:36:06.680 --> 00:36:13.620
<v Vegard>is just on a tokio runtime and it just works very well just doing basic features

00:36:13.620 --> 00:36:18.699
<v Vegard>to handle HTTP requests and doing features to send database queries and get responses.

00:36:18.960 --> 00:36:22.960
<v Vegard>And that just works very well. But when you're trying to combine that with a

00:36:22.960 --> 00:36:29.980
<v Vegard>feature in the HTTP layer to also provide some computations, we ran into some issues.

00:36:30.320 --> 00:36:34.940
<v Vegard>So a few months back, someone used our API in a way that we hadn't anticipated.

00:36:35.040 --> 00:36:40.600
<v Vegard>And there was too much traffic on something that blocked. and just everything,

00:36:40.600 --> 00:36:46.040
<v Vegard>everything just stagnates and response time speaks and it affects everything.

00:36:46.320 --> 00:36:52.340
<v Vegard>And just trying to hunt down where we actually block or do computation for so

00:36:52.340 --> 00:36:56.800
<v Vegard>long that you're starving the tokio runtime, that was very challenging.

00:36:57.100 --> 00:37:04.000
<v Matthias>What I see a lot is teams using their development laptop to start a larger tokio

00:37:04.000 --> 00:37:08.020
<v Matthias>application with say 16 or 32 cores.

00:37:08.020 --> 00:37:12.940
<v Matthias>And then when they deploy the same service to production it ends up running

00:37:12.940 --> 00:37:19.540
<v Matthias>on a two core node and obviously that's a completely different environment.

00:37:19.940 --> 00:37:21.640
<v Matthias>Was it one of these cases where,

00:37:22.660 --> 00:37:28.540
<v Matthias>the production system was very resource constrained and when you tested it in development it was not.

00:37:29.140 --> 00:37:32.260
<v Vegard>The problem manifested itself when the traffic increased

00:37:32.260 --> 00:37:35.420
<v Vegard>enough to actually trigger it so we

00:37:35.420 --> 00:37:39.780
<v Vegard>didn't really trigger it we could reproduce it locally at some point when we

00:37:39.780 --> 00:37:44.040
<v Vegard>actually knew what traffic to induce but so we had some inklings when stuff

00:37:44.040 --> 00:37:50.580
<v Vegard>went wrong but it was quite a goose chase down this set of futures and where

00:37:50.580 --> 00:37:53.120
<v Vegard>do you actually, how do you measure what blocks?

00:37:53.380 --> 00:37:57.000
<v Vegard>And trying to use tooling like tokio console, it's a great project,

00:37:57.140 --> 00:38:01.500
<v Vegard>but it's just not insightful enough at that level yet.

00:38:01.840 --> 00:38:06.020
<v Vegard>So I would say the tooling is probably not right for the abstractions we need

00:38:06.020 --> 00:38:11.020
<v Vegard>to be able to efficiently bisect where is the issue and how do I solve it?

00:38:11.380 --> 00:38:16.340
<v Vegard>Solving it is very easy in tokio. You just spawn it on the blocking runtime

00:38:16.340 --> 00:38:19.780
<v Vegard>and it's fine but it's definitely something to be aware of so it's a pitfall

00:38:19.780 --> 00:38:22.780
<v Vegard>for newer developers and it got us as well.

00:38:22.780 --> 00:38:25.640
<v Matthias>The typical pattern is that you see a spike

00:38:25.640 --> 00:38:28.780
<v Matthias>on the cpu and there's really

00:38:28.780 --> 00:38:36.840
<v Matthias>not much traffic coming in anymore it blocks on the the api layer but in reality

00:38:36.840 --> 00:38:42.800
<v Matthias>your cpu is super busy with some computation but then it still doesn't tell

00:38:42.800 --> 00:38:46.860
<v Matthias>you where that computation happens You just need to dig deeper and understand

00:38:46.860 --> 00:38:48.360
<v Matthias>the business logic of it all.

00:38:49.460 --> 00:38:54.280
<v Vegard>So that's somewhere also where the distributed tracing you have in an application

00:38:54.280 --> 00:39:00.180
<v Vegard>and how you have insight into that comes well into mind i also like the tracing

00:39:00.180 --> 00:39:06.820
<v Vegard>ecosystem very good love it but figuring out how you actually use tracing and

00:39:06.820 --> 00:39:08.199
<v Vegard>like a distributed sense,

00:39:08.739 --> 00:39:12.800
<v Vegard>it's a learning curve where you have to basically puzzle about pieces together

00:39:12.800 --> 00:39:16.800
<v Vegard>yourself to figure at how do you actually get the correct level of tracing in

00:39:16.800 --> 00:39:19.860
<v Vegard>the applications and across applications.

00:39:20.100 --> 00:39:26.199
<v Vegard>That's also probably an area where there would be a good fit for some higher

00:39:26.199 --> 00:39:30.560
<v Vegard>level abstraction crates for server application that just needs to have good

00:39:30.560 --> 00:39:31.600
<v Vegard>defaults on everything.

00:39:32.199 --> 00:39:37.360
<v Matthias>Do you use tracing across language boundaries or just within the Rust context?

00:39:37.360 --> 00:39:46.239
<v Vegard>We use the W3C trace context standard to send trace parent headers to correlate

00:39:46.239 --> 00:39:50.560
<v Vegard>tracing information across applications, but that works fine.

00:39:51.320 --> 00:39:56.940
<v Vegard>We set up our own tracing infrastructure using tracing to create with a custom

00:39:56.940 --> 00:40:00.340
<v Vegard>subscriber to Azure App Insights.

00:40:00.640 --> 00:40:04.199
<v Vegard>App Insights is a good service, but it's also quite expensive. but

00:40:04.199 --> 00:40:06.980
<v Vegard>just knowing where to wire up what you need

00:40:06.980 --> 00:40:10.360
<v Vegard>to call when and how where in

00:40:10.360 --> 00:40:13.380
<v Vegard>tracing and how do you model that into whatever

00:40:13.380 --> 00:40:16.320
<v Vegard>subscriber you have so using for

00:40:16.320 --> 00:40:19.640
<v Vegard>instance OpenTelemetry versus Honeycomb versus App

00:40:19.640 --> 00:40:23.800
<v Vegard>Insights they all have a different behavior on how you open spans and when you

00:40:23.800 --> 00:40:27.920
<v Vegard>close them and how you annotate them and when you actually send the event it's

00:40:27.920 --> 00:40:32.920
<v Vegard>a learning curve to just employing correct tracing in your application is not

00:40:32.920 --> 00:40:36.360
<v Vegard>something that's extremely easy to understand.

00:40:36.600 --> 00:40:38.580
<v Vegard>So you usually spend a few months on it.

00:40:38.920 --> 00:40:45.500
<v Matthias>From our conversation so far, it feels like a lot of services run on Azure or,

00:40:46.680 --> 00:40:55.580
<v Matthias>cloud in more general terms, but how does that relate to whatever you do on the ground stations?

00:40:56.260 --> 00:41:00.980
<v Vegard>The API layers we've developed over the years, it's been primarily,

00:41:01.180 --> 00:41:05.600
<v Vegard>as you say, in a cloud setting, but due to the widespread nature of our antennas

00:41:05.600 --> 00:41:09.100
<v Vegard>and where they are, we're also resource constrained, as we touched on earlier,

00:41:09.300 --> 00:41:11.199
<v Vegard>on the resources we have on each antenna.

00:41:11.199 --> 00:41:16.300
<v Vegard>And our challenges are often related to having running code there that can run

00:41:16.300 --> 00:41:18.780
<v Vegard>forever and not have any downtime, really.

00:41:19.120 --> 00:41:25.440
<v Vegard>Many years ago, we deployed at least one service on each antenna stack throughout,

00:41:25.820 --> 00:41:26.680
<v Vegard>which is written in Rust.

00:41:26.880 --> 00:41:32.340
<v Vegard>And it's just responsible for ping-ponging back whatever is in the cloud.

00:41:32.620 --> 00:41:36.120
<v Vegard>What should I do on this antenna? So it's what we call our scheduler.

00:41:36.280 --> 00:41:40.900
<v Vegard>We schedule anything and synchronize what we have there. And that has been running

00:41:40.900 --> 00:41:46.120
<v Vegard>flawlessly on 120 antennas or something for three years now.

00:41:46.320 --> 00:41:49.800
<v Vegard>I think I've had two bugs on it, and it's been purely logic bugs.

00:41:50.140 --> 00:41:52.860
<v Vegard>The problem with bugs in that is that when there's a bug there,

00:41:53.000 --> 00:41:57.760
<v Vegard>it affects everything. Because, yeah, nothing is happening on the antenna if the scheduler is down.

00:41:58.340 --> 00:42:02.900
<v Vegard>Other than that, we also have data distribution and just pushing metrics from

00:42:02.900 --> 00:42:07.120
<v Vegard>our basement equipment to the antennas. And everyone wants to consume those,

00:42:07.239 --> 00:42:09.340
<v Vegard>your customers, system engineers,

00:42:10.610 --> 00:42:15.090
<v Vegard>big part of our infrastructure is also just having the correct tooling on each

00:42:15.090 --> 00:42:20.090
<v Vegard>antenna to be able to send out this infrastructure and we're using rust for that as well.

00:42:20.090 --> 00:42:26.110
<v Matthias>It's incredible how far you are in your rust journey already i had no idea really. About

00:42:26.110 --> 00:42:30.489
<v Matthias>the scheduler so what inputs does it take and what outputs does it generate.

00:42:30.489 --> 00:42:34.430
<v Vegard>So it's running a in-house

00:42:34.430 --> 00:42:38.489
<v Vegard>protocol to synchronize whatever schedule is available

00:42:38.489 --> 00:42:41.670
<v Vegard>in the cloud and the cloud database is

00:42:41.670 --> 00:42:45.290
<v Vegard>the source of truth and the schedulers

00:42:45.290 --> 00:42:48.270
<v Vegard>on each antenna site is just figuring out what to

00:42:48.270 --> 00:42:52.469
<v Vegard>synchronize from the cloud so i can operate autonomously in

00:42:52.469 --> 00:42:55.830
<v Vegard>case of network failure without network connectivity

00:42:55.830 --> 00:42:59.390
<v Vegard>we can still operate and take your contacts and yeah

00:42:59.390 --> 00:43:02.370
<v Vegard>so it just synchronizes whatever it does

00:43:02.370 --> 00:43:06.050
<v Vegard>there over a custom HTTP protocol and

00:43:06.050 --> 00:43:14.130
<v Vegard>as the contact is about to begin it kicks off a event to a another service which

00:43:14.130 --> 00:43:17.790
<v Vegard>we call the controller it's the controller of the entire contact just controls

00:43:17.790 --> 00:43:22.370
<v Vegard>all the baseband equipment and whatever firewalls and whatnot needs to be opened

00:43:22.370 --> 00:43:25.030
<v Vegard>and controlled. It's a just-in-time scheduler.

00:43:25.930 --> 00:43:30.370
<v Matthias>And the reason why it doesn't pull everything is resource constraints again.

00:43:30.370 --> 00:43:35.210
<v Vegard>Yeah and it also doesn't need to have the full state of the entire database

00:43:35.210 --> 00:43:40.410
<v Vegard>because from clouds it only needs to know what do i need to do it would be very

00:43:40.410 --> 00:43:45.690
<v Vegard>inefficient to synchronize the remote states from the cloud to every antenna

00:43:45.690 --> 00:43:47.810
<v Vegard>that would not be feasible.

00:43:47.239 --> 00:43:47.820
<v Matthias>But

00:43:47.810 --> 00:43:56.350
<v Matthias>the calculation for knowing what it needs is that CPU bound or is the focus

00:43:56.350 --> 00:43:58.250
<v Matthias>again on reliability here.

00:43:58.250 --> 00:44:02.930
<v Vegard>Solely on reliability. So for scheduling or synchronizing,

00:44:03.290 --> 00:44:08.449
<v Vegard>it's configurable for a scheduler, but usually it's deployed to one or three

00:44:08.449 --> 00:44:12.090
<v Vegard>days ahead, so it can run a while until if we're,

00:44:12.760 --> 00:44:18.120
<v Vegard>losing network connectivity and we can still salvage a lot of data even if we

00:44:18.120 --> 00:44:20.260
<v Vegard>don't have connectivity to the cloud.

00:44:20.260 --> 00:44:27.600
<v Matthias>Very impressive but that means the entire chain from the satellite all the way

00:44:27.600 --> 00:44:31.219
<v Matthias>to the customer is at least in

00:44:31.219 --> 00:44:36.360
<v Matthias>parts written in rust nowadays. What is your message to the Rust community?

00:44:38.320 --> 00:44:44.020
<v Vegard>I think my primary message to the Rust community is just polish up async.

00:44:44.420 --> 00:44:48.500
<v Vegard>Get it to be the best experience it can ever be.

00:44:48.719 --> 00:44:55.500
<v Vegard>There are some pitfalls now, even though Rust 2024 edition stabilized the async closures.

00:44:55.820 --> 00:45:00.219
<v Vegard>Very happy about that. But there are still some questions around observability

00:45:00.219 --> 00:45:05.120
<v Vegard>of what is happening within an async context and how do you navigate that?

00:45:05.380 --> 00:45:10.760
<v Vegard>And just, yeah, and getting to the bottom of issues related to,

00:45:10.880 --> 00:45:16.640
<v Vegard>as I said, the blocking issues we have and just cancellation safety and drop

00:45:16.640 --> 00:45:22.040
<v Vegard>safety and async drop and all these paper cuts that just are not completely answered.

00:45:22.820 --> 00:45:29.440
<v Vegard>That would be my message to really polish up that. That would make selling Rust to others much easier.

00:45:30.020 --> 00:45:36.820
<v Matthias>Yes, I could get behind this. Vegard, thanks so much for taking the time and for being a guest today.

00:45:37.600 --> 00:45:39.540
<v Vegard>It's my pleasure. Thank you for having me.

00:45:40.219 --> 00:45:43.900
<v Matthias>Rust in Production is a podcast by corrode. It is hosted by me,

00:45:44.180 --> 00:45:46.960
<v Matthias>Matthias Endler, and produced by Simon Brüggen.

00:45:47.140 --> 00:45:51.420
<v Matthias>For show notes, transcripts, and to learn more about how we can help your company

00:45:51.420 --> 00:45:54.300
<v Matthias>make the most of Rust, visit corrode.dev.

00:45:54.500 --> 00:45:56.880
<v Matthias>Thanks for listening to Rust in Production.