WEBVTT

00:00:01.450 --> 00:00:06.090
<v Matthias>It's Rust in Production, a podcast about companies who use Rust to shape the

00:00:06.090 --> 00:00:07.030
<v Matthias>future of infrastructure.

00:00:07.310 --> 00:00:11.910
<v Matthias>I'm Matthias Endler from corrode, and today's guest is Tom Hacohen from Svix.

00:00:12.110 --> 00:00:15.410
<v Matthias>We talk about reliable webhooks at scale with Rust.

00:00:19.090 --> 00:00:23.270
<v Matthias>Tom, thanks a lot for being a part of this podcast.

00:00:23.730 --> 00:00:27.950
<v Matthias>Can you introduce yourself and Svix, the company you founded?

00:00:27.950 --> 00:00:30.790
<v Tom>It yeah for sure so i'm tom

00:00:30.790 --> 00:00:33.510
<v Tom>the founder and ceo of Svix what we

00:00:33.510 --> 00:00:36.250
<v Tom>do is webhook sending as a service so we

00:00:36.250 --> 00:00:39.130
<v Tom>help companies send webhooks you know people use

00:00:39.130 --> 00:00:42.030
<v Tom>our simple apis and sdks and really with a

00:00:42.030 --> 00:00:45.050
<v Tom>few lines of code they get the state-of-the-art webhook sending system and

00:00:45.050 --> 00:00:47.950
<v Tom>actually i mean now that i think about it we also finally publicly launched

00:00:47.950 --> 00:00:52.410
<v Tom>the receiving aspect as well so we also help you with receiving webhooks so

00:00:52.410 --> 00:00:57.990
<v Tom>you know if you use a vendor that has not so great webhooks i mean preferably

00:00:57.990 --> 00:01:01.170
<v Tom>you just tell them to use svix and you know they can improve the the you know

00:01:01.170 --> 00:01:05.390
<v Tom>improve everything for everyone but if not they can just you can use the receiving

00:01:05.390 --> 00:01:07.850
<v Tom>svix and kind of get a great experience in front of that.

00:01:07.850 --> 00:01:12.810
<v Matthias>What's so special about webhooks isn't it just a fancy link i mean i know how

00:01:12.810 --> 00:01:19.130
<v Matthias>to serve an api and i can probably take a call from someone but why is it so

00:01:19.130 --> 00:01:20.810
<v Matthias>special why should i use a service for that.

00:01:20.810 --> 00:01:26.230
<v Tom>Yeah so i think first of all yeah it's really just a simple HTTP post request.

00:01:26.490 --> 00:01:30.610
<v Tom>You know, that's all you need to do. But I think like everything in product

00:01:30.610 --> 00:01:33.330
<v Tom>and engineering, the devil is into detail.

00:01:33.490 --> 00:01:36.290
<v Tom>And it kind of like when you start moving to production, all of a sudden,

00:01:36.530 --> 00:01:37.670
<v Tom>you know, it needs to scale.

00:01:37.930 --> 00:01:41.670
<v Tom>And when you have multiple customers, you need to make sure that one customer

00:01:41.670 --> 00:01:43.790
<v Tom>having issues is not affecting the other customer.

00:01:43.970 --> 00:01:46.950
<v Tom>So you have like noisy neighbor problems and you have like security because

00:01:46.950 --> 00:01:48.830
<v Tom>webhooks, you know, you kind of mentioned API.

00:01:49.730 --> 00:01:52.750
<v Tom>It's well understood. You have, you know, like a request

00:01:52.750 --> 00:01:55.450
<v Tom>comes in you do some off token checks and

00:01:55.450 --> 00:01:58.290
<v Tom>then you respond but with webhooks you actually make

00:01:58.290 --> 00:02:01.930
<v Tom>calls from your own server to an address controlled by

00:02:01.930 --> 00:02:06.570
<v Tom>your customer slash attacker maybe and they can point it to an internal system

00:02:06.570 --> 00:02:10.050
<v Tom>and you know trick you this is called with server-side request forgery so essentially

00:02:10.050 --> 00:02:13.690
<v Tom>without going too much into the details there's just a lot many more things

00:02:13.690 --> 00:02:17.250
<v Tom>that you need to worry about than you then people that people are not used to

00:02:17.250 --> 00:02:18.510
<v Tom>but additionally you know,

00:02:19.170 --> 00:02:23.250
<v Tom>when it comes to product infrastructure, which is where I place us.

00:02:24.760 --> 00:02:28.300
<v Tom>The best you can do, let's assume you spend a lot of time on it,

00:02:28.560 --> 00:02:30.020
<v Tom>and we spend a lot of time on it, right?

00:02:30.120 --> 00:02:34.660
<v Tom>We're a big team just working on webhooks. If you spend even a fraction of the

00:02:34.660 --> 00:02:37.840
<v Tom>time we spend on it, and you're doing a great job and you're wasting a lot of

00:02:37.840 --> 00:02:42.660
<v Tom>your team's time instead of advancing the product roadmap, and everything just works,

00:02:43.100 --> 00:02:48.260
<v Tom>then your customers aren't happy with you because you haven't built the features that they wanted.

00:02:48.700 --> 00:02:51.640
<v Tom>And if you do anything if you do your the job less

00:02:51.640 --> 00:02:54.820
<v Tom>anything less than stellar then that means uptime a

00:02:54.820 --> 00:02:57.940
<v Tom>downtime sorry and so like you're gonna have issues so really

00:02:57.940 --> 00:03:00.780
<v Tom>i think when it comes to infrastructure like really the best you

00:03:00.780 --> 00:03:04.780
<v Tom>can do even if you do everything perfect is just have a neutral effect on your

00:03:04.780 --> 00:03:07.580
<v Tom>customers your customers won't notice that you spend all of this time on it

00:03:07.580 --> 00:03:11.720
<v Tom>so and this is part of what we help you to do is really just like give your

00:03:11.720 --> 00:03:16.080
<v Tom>customers a state-of-the-art experience they don't think about they don't worry

00:03:16.080 --> 00:03:20.100
<v Tom>about it just works and for you you don't have to maintain it and worry about any of that.

00:03:20.100 --> 00:03:28.160
<v Matthias>What do i get when i use fix other than a endpoint is it also providing some

00:03:28.160 --> 00:03:33.380
<v Matthias>logs monitoring metrics and uptime statistics what do i get from that.

00:03:33.380 --> 00:03:36.680
<v Tom>Yeah so that's a great question because you

00:03:36.680 --> 00:03:39.440
<v Tom>know when it comes to webhooks you know again going back

00:03:39.440 --> 00:03:42.460
<v Tom>to the api request if your customer of yours tries to

00:03:42.460 --> 00:03:45.340
<v Tom>make an api request and that api request

00:03:45.340 --> 00:03:48.540
<v Tom>fails right they would know it fails it failed

00:03:48.540 --> 00:03:51.840
<v Tom>they got an error code back with web hooks they

00:03:51.840 --> 00:03:54.720
<v Tom>don't know that you were just sending a web hook just now

00:03:54.720 --> 00:03:57.660
<v Tom>and it failed for whatever reason they have zero visibility because it's

00:03:57.660 --> 00:04:00.940
<v Tom>generated by the sender so logs and

00:04:00.940 --> 00:04:05.500
<v Tom>and observability are essential there and yes we offer that out of the box and

00:04:05.500 --> 00:04:09.120
<v Tom>we actually have a pre-built ui that our customers can embed in their customers

00:04:09.120 --> 00:04:14.580
<v Tom>fully white labeled sorry embed in their dashboards for their customers and

00:04:14.580 --> 00:04:18.780
<v Tom>then their customers can just go and use that to manage the webhooks, see the observability.

00:04:19.500 --> 00:04:23.220
<v Tom>We have automatic retries, but they can also manually retry once they fix an

00:04:23.220 --> 00:04:28.340
<v Tom>issue, really have full visibility and full control into the whole webhooks sending system.

00:04:29.460 --> 00:04:32.200
<v Matthias>Before you built Svix, there must have been a moment when you thought,

00:04:32.380 --> 00:04:36.740
<v Matthias>wow, this is really challenging to pull off and there is no existing service

00:04:36.740 --> 00:04:39.800
<v Matthias>that solves that problem. Take us back to that time.

00:04:41.080 --> 00:04:43.840
<v Tom>Yeah so it's exactly how it started

00:04:43.840 --> 00:04:46.740
<v Tom>and it was really just by luck so at my previous company

00:04:46.740 --> 00:04:49.460
<v Tom>people were asking us for webhooks i mean we kind of like

00:04:49.460 --> 00:04:52.320
<v Tom>looked into it started designing the system you know

00:04:52.320 --> 00:04:55.180
<v Tom>i did everything to kind of like to plan for

00:04:55.180 --> 00:04:57.960
<v Tom>the build and then realized it's actually so much work and we

00:04:57.960 --> 00:05:00.660
<v Tom>just don't have the capacity to build it at the moment so we

00:05:00.660 --> 00:05:03.380
<v Tom>said no and then people kept on asking us and we kind of like went

00:05:03.380 --> 00:05:06.060
<v Tom>back to the drawing board like okay we can find the time to you know to

00:05:06.060 --> 00:05:08.660
<v Tom>do the initial build but then realized there's going

00:05:08.660 --> 00:05:11.460
<v Tom>to be a lot of maintenance you know ongoing maintenance and like

00:05:11.460 --> 00:05:14.080
<v Tom>you know they're going to be like random requests by customers that we're going

00:05:14.080 --> 00:05:17.420
<v Tom>to have to build otherwise they're going to be unhappy so we said no again and

00:05:17.420 --> 00:05:20.280
<v Tom>then people kept on asking us still and we kind of like okay you

00:05:20.280 --> 00:05:23.320
<v Tom>know we're going to find the time to build it we're going to allocate the resource to maintain it

00:05:23.320 --> 00:05:26.080
<v Tom>but we have this beautiful api great product we're not

00:05:26.080 --> 00:05:29.140
<v Tom>going to have a terrible webhook experience right so we multiply the

00:05:29.140 --> 00:05:33.060
<v Tom>first two estimates by like 3x and 4x and again said no and.

00:05:33.060 --> 00:05:35.820
<v Tom>I i wasn't smart enough to realize at that point that there was a business

00:05:35.820 --> 00:05:38.560
<v Tom>there but it was like a few months later that a

00:05:38.560 --> 00:05:41.200
<v Tom>friend of mine asked me a question about webhooks in like in a

00:05:41.200 --> 00:05:43.920
<v Tom>slack community and i kind of like started explaining to her you should do x and you

00:05:43.920 --> 00:05:46.940
<v Tom>should do y and i realized she couldn't care less about anything i

00:05:46.940 --> 00:05:49.600
<v Tom>was saying she only wanted to send webhooks you don't have

00:05:49.600 --> 00:05:52.280
<v Tom>to think about it and that's kind of like when the you know the

00:05:52.280 --> 00:05:55.020
<v Tom>apple dropped on my head i mean and i realized that

00:05:55.020 --> 00:05:58.260
<v Tom>you know what i had the same problem and and i

00:05:58.260 --> 00:06:02.820
<v Tom>asked her like hey if i build a service that does this will you pay me and she's

00:06:02.820 --> 00:06:06.600
<v Tom>like hell yeah and that's kind of like how it all started and by the way spoiler

00:06:06.600 --> 00:06:11.480
<v Tom>alert she never paid but other people from that slack that did become our first

00:06:11.480 --> 00:06:15.740
<v Tom>customers so that was a win and i'm eternally grateful for for her asking that question.

00:06:16.380 --> 00:06:21.580
<v Matthias>I really like that story because it has a few ingredients which i look for in

00:06:21.580 --> 00:06:29.600
<v Matthias>companies they solve a very narrow focus problem and no one's gonna take away your lunch because,

00:06:30.240 --> 00:06:33.280
<v Matthias>other people don't want to care about that problem it's

00:06:33.280 --> 00:06:38.180
<v Matthias>essentially quote-unquote boring for them someone needs to solve it but then

00:06:38.180 --> 00:06:43.160
<v Matthias>if you look behind it there's a ton of complexity and you need to do it right

00:06:43.160 --> 00:06:48.340
<v Matthias>in order to provide a good service but can you talk about those customers a

00:06:48.340 --> 00:06:51.880
<v Matthias>little bit who were the people that asked for webhooks all along.

00:06:51.880 --> 00:06:54.980
<v Tom>Yeah so those initial few

00:06:54.980 --> 00:06:57.700
<v Tom>were small you know friends of

00:06:57.700 --> 00:07:01.160
<v Tom>mine like it's like a startup slack like small startups one

00:07:01.160 --> 00:07:05.700
<v Tom>was like a jira competitor i guess you would say and and you know they needed

00:07:05.700 --> 00:07:09.220
<v Tom>webhooks because jira has webhooks and customers rely on it to to build those

00:07:09.220 --> 00:07:12.660
<v Tom>integrations you know when you close a ticket they want to run some workflow

00:07:12.660 --> 00:07:16.660
<v Tom>but since then we know we started we've been working with all the way from like

00:07:16.660 --> 00:07:19.900
<v Tom>very small startups so indie developers all the way up to the fortune 50,

00:07:21.160 --> 00:07:25.120
<v Tom>and the use cases are very varied to the point that I actually,

00:07:25.140 --> 00:07:29.640
<v Tom>I kind of like look at us and I draw the parallels to like Twilio for SMS or

00:07:29.640 --> 00:07:32.360
<v Tom>SendGrid for email in the sense that,

00:07:33.070 --> 00:07:36.290
<v Tom>you know people just send emails i you know like sandwich customers just send

00:07:36.290 --> 00:07:39.850
<v Tom>emails i sandwich doesn't care what's in the email and similar to us we just

00:07:39.850 --> 00:07:46.550
<v Tom>see so many different use cases that people just use us for for really whatever kind.

00:07:46.550 --> 00:07:52.830
<v Matthias>Of makes me wonder if you were not really scared of being a bottleneck or a

00:07:52.830 --> 00:07:55.390
<v Matthias>single point of failure for all of these customers.

00:07:56.170 --> 00:07:59.110
<v Tom>Yeah man this is this was at least in the beginning was

00:07:59.110 --> 00:08:01.830
<v Tom>like a dreading thought and you know

00:08:01.830 --> 00:08:04.550
<v Tom>one that literally kept us up at night with page duty and you know

00:08:04.550 --> 00:08:07.210
<v Tom>and everything but but you know

00:08:07.210 --> 00:08:10.150
<v Tom>like a lot of startups they would focus you know it's kind of move fast and break things

00:08:10.150 --> 00:08:13.570
<v Tom>right like you would focus on speed as

00:08:13.570 --> 00:08:17.490
<v Tom>the main feature and similar to other startups fee speed is extremely important

00:08:17.490 --> 00:08:23.330
<v Tom>for us but for us stability has been like the feature that we offer like we

00:08:23.330 --> 00:08:28.450
<v Tom>we have a stellar track record and it's because we spend a lot of time effort

00:08:28.450 --> 00:08:31.810
<v Tom>and money on redundant infrastructure to make sure that we do,

00:08:32.170 --> 00:08:36.810
<v Tom>you know, we are that potential point of failure and we make sure that we don't

00:08:36.810 --> 00:08:40.750
<v Tom>ever go down, knock on wood, obviously, but we spend a lot of time and effort

00:08:40.750 --> 00:08:42.450
<v Tom>there. And to the point that,

00:08:43.250 --> 00:08:46.470
<v Tom>we are more stable than homegrown webhook systems.

00:08:46.770 --> 00:08:49.630
<v Tom>Like we are, you know, we work with some of the best companies out there,

00:08:49.750 --> 00:08:51.550
<v Tom>whether it's like Brex and Benchling and Lob.

00:08:52.610 --> 00:08:55.930
<v Tom>And again, as I said, I have Fortune 50, so we can't mention their names, unfortunately.

00:08:56.570 --> 00:09:01.290
<v Tom>And we are, you know, they trust us and use us because we are stable. We are reliable.

00:09:01.730 --> 00:09:04.610
<v Tom>And again, and we focus all our time on it. So it makes sense.

00:09:05.210 --> 00:09:10.090
<v Matthias>Okay, you said speed matters, especially for startups. And I fully agree with that.

00:09:10.630 --> 00:09:17.230
<v Matthias>But then immediately a lot of people might think is rust the best application

00:09:17.230 --> 00:09:21.210
<v Matthias>it's it's rust the best language to build such a service then if you want to

00:09:21.210 --> 00:09:24.610
<v Matthias>iterate quickly can't you build a quicker prototype in another language.

00:09:24.610 --> 00:09:29.370
<v Tom>So first of all we did build initial version of Svix was actually written in

00:09:29.370 --> 00:09:32.850
<v Tom>python so i guess there you know there are legs to that statement.

00:09:33.550 --> 00:09:37.030
<v Tom>And i think you know rust is problematic for

00:09:37.030 --> 00:09:39.770
<v Tom>fast iteration at the at the beginning of

00:09:39.770 --> 00:09:43.190
<v Tom>the company so like the way i think about it is if

00:09:43.190 --> 00:09:46.130
<v Tom>you're building i mean okay if you're very proficient in rust

00:09:46.130 --> 00:09:49.090
<v Tom>and you know you you move very quickly in rust just do

00:09:49.090 --> 00:09:51.890
<v Tom>it and you just use rust you don't have to worry about it just use

00:09:51.890 --> 00:09:55.250
<v Tom>you know use the tool that you know but at the beginning the

00:09:55.250 --> 00:09:57.930
<v Tom>really the beginning of the company when you don't even know what you're building and you're

00:09:57.930 --> 00:10:00.750
<v Tom>trying to figure it out and really what matters is

00:10:00.750 --> 00:10:04.150
<v Tom>just getting something clunky out there working and i

00:10:04.150 --> 00:10:07.590
<v Tom>think rust with the emphasis on correctness can

00:10:07.590 --> 00:10:10.530
<v Tom>hold you back but i think the moment you have even like

00:10:10.530 --> 00:10:13.790
<v Tom>a hint of traction like you know like even like a few months in and

00:10:13.790 --> 00:10:17.130
<v Tom>you realize that what you're building is generally you

00:10:17.130 --> 00:10:19.950
<v Tom>know again generally not it doesn't have to be precise but

00:10:19.950 --> 00:10:22.930
<v Tom>generally what you're going to build i think then rust is

00:10:22.930 --> 00:10:26.070
<v Tom>extremely powerful because rust you

00:10:26.070 --> 00:10:28.790
<v Tom>know when we were using python we we used

00:10:28.790 --> 00:10:31.750
<v Tom>to write a lot of tests we still write a lot of tests but much

00:10:31.750 --> 00:10:35.190
<v Tom>less and also kind of like have issues

00:10:35.190 --> 00:10:38.270
<v Tom>in staging and sometimes very rarely

00:10:38.270 --> 00:10:42.730
<v Tom>have issues in production as well and with rust it's kind of like you spend

00:10:42.730 --> 00:10:48.070
<v Tom>all of that time in the beginning but the likelihood of errors is significantly

00:10:48.070 --> 00:10:52.290
<v Tom>diminished later on so it's kind of like this it's almost there's like a preservation

00:10:52.290 --> 00:10:55.970
<v Tom>of energy so preservation of development time But with Python.

00:10:56.150 --> 00:11:00.770
<v Tom>You don't spend a lot of energy in the beginning, but you do spend it once it

00:11:00.770 --> 00:11:03.110
<v Tom>hits production, once you need to debug, once you need to refactor.

00:11:03.570 --> 00:11:07.810
<v Tom>And with Rust, you spend more in the beginning, but actually then later on,

00:11:07.810 --> 00:11:11.490
<v Tom>you don't have to spend as much energy. And I think now we move fast.

00:11:12.540 --> 00:11:14.620
<v Matthias>Wasn't that transition painful.

00:11:14.620 --> 00:11:18.620
<v Tom>Oh extremely yeah i

00:11:18.620 --> 00:11:21.980
<v Tom>think you know like as i said like being reliable for

00:11:21.980 --> 00:11:24.820
<v Tom>our customers is the most important thing for us

00:11:24.820 --> 00:11:28.040
<v Tom>so we couldn't do what some

00:11:28.040 --> 00:11:31.020
<v Tom>companies do even even some of our competitors do which is

00:11:31.020 --> 00:11:34.020
<v Tom>you know oh yeah we're going to be down for a few hours on saturday

00:11:34.020 --> 00:11:36.800
<v Tom>we have a maintenance period we can't do anything like

00:11:36.800 --> 00:11:40.200
<v Tom>that so it was all like live transitions and you

00:11:40.200 --> 00:11:42.960
<v Tom>know making sure we don't break it to you know don't break anything for

00:11:42.960 --> 00:11:47.580
<v Tom>anyone like a lot of testing and what's i

00:11:47.580 --> 00:11:51.560
<v Tom>guess challenging about what it is that we do is or

00:11:51.560 --> 00:11:55.140
<v Tom>one of the things is you know we send http requests

00:11:55.140 --> 00:11:58.680
<v Tom>to a variety of servers right i mean like webhooks so that

00:11:58.680 --> 00:12:02.020
<v Tom>that means that we interact with you

00:12:02.020 --> 00:12:08.480
<v Tom>know i well well what's the name of that microsoft one ias iss why ias yeah

00:12:08.480 --> 00:12:13.340
<v Tom>so like ias apache nginx you know all of the wide variety of servers and some

00:12:13.340 --> 00:12:17.820
<v Tom>of them have very specific requirements for how a client should interact with

00:12:17.820 --> 00:12:21.000
<v Tom>them at the moment you know we switch to another client and like,

00:12:21.560 --> 00:12:27.500
<v Tom>like for example in rust we we were facing issues and i think actually a small

00:12:27.500 --> 00:12:30.260
<v Tom>tangent you know like one of the best thing and one of the worst thing about

00:12:30.260 --> 00:12:34.860
<v Tom>the rust ecosystem is the love for perfection and unfortunately the world out

00:12:34.860 --> 00:12:37.760
<v Tom>there is like ugly and dirty and when you interact with that world.

00:12:39.140 --> 00:12:42.700
<v Tom>Sometimes you can't be perfect like yeah you can adhere to the spec perfectly

00:12:42.700 --> 00:12:47.880
<v Tom>but if in our case but if the servers that we send the webhooks to expect something

00:12:47.880 --> 00:12:52.480
<v Tom>that is not required by the spec but i'm sorry that's what i expect you know

00:12:52.480 --> 00:12:55.360
<v Tom>we need to follow that we don't care that they're wrong we have to,

00:12:56.350 --> 00:13:00.970
<v Tom>We have to deliver that message. And so that, yeah, there were just like a variety of challenges.

00:13:01.150 --> 00:13:04.510
<v Tom>It just kind of triggered a bit of PTSD, I guess, from that time.

00:13:04.730 --> 00:13:10.890
<v Matthias>If we go back to the first Python prototype that you built, it somehow also worked, right?

00:13:11.070 --> 00:13:19.910
<v Matthias>It was running in production, but what was the general sentiment like at this point in time?

00:13:20.150 --> 00:13:26.730
<v Matthias>Did you trust in the application? what was the deployment process the icd testing

00:13:26.730 --> 00:13:30.130
<v Matthias>how many issues did you find in production versus development.

00:13:30.130 --> 00:13:33.410
<v Tom>And so on yes so actually

00:13:33.410 --> 00:13:36.130
<v Tom>a bit of again a tangent before like i was when i

00:13:36.130 --> 00:13:38.830
<v Tom>started coding i was i guess pearl in the beginning

00:13:38.830 --> 00:13:42.210
<v Tom>or whatever but i was like my first professional job was

00:13:42.210 --> 00:13:46.010
<v Tom>like a c embedded c engineer and again

00:13:46.010 --> 00:13:49.350
<v Tom>i have the scars to show for like you know like the living as

00:13:49.350 --> 00:13:53.070
<v Tom>a c engineer where like everything can break everything is scary and

00:13:53.070 --> 00:13:56.370
<v Tom>and and a very primitive type system as

00:13:56.370 --> 00:13:59.030
<v Tom>well so one of the things that i was doing

00:13:59.030 --> 00:14:03.670
<v Tom>i remember that as like a very junior engineer is adding a lot of typing as

00:14:03.670 --> 00:14:06.790
<v Tom>much as i can to see with like you know there are like a few tricks you do like

00:14:06.790 --> 00:14:10.870
<v Tom>opaque structs and like opaque type depths with like opaque structs whatever

00:14:10.870 --> 00:14:17.450
<v Tom>like a few a few ways a few tricks to do it um but that experience just made me obsessed with,

00:14:17.970 --> 00:14:22.930
<v Tom>telling the compiler what I really mean and having the compiler complain at

00:14:22.930 --> 00:14:26.910
<v Tom>build time about all the issues so I don't have to think about runtime.

00:14:27.070 --> 00:14:30.710
<v Tom>I know that if it compiles, it works. And by the way, I think that's partially

00:14:30.710 --> 00:14:32.090
<v Tom>why I'm such a big fan of Rust.

00:14:32.750 --> 00:14:37.210
<v Tom>So the reason why I told you all of this is that when we were writing Python

00:14:37.210 --> 00:14:39.830
<v Tom>code, it was already very good with typing.

00:14:40.050 --> 00:14:47.150
<v Tom>We had hackery around SQL Alchemy to make it more type safe.

00:14:47.450 --> 00:14:54.050
<v Tom>And we had even like scaffolding for fast API to make that even better and Pydantic.

00:14:54.470 --> 00:14:58.990
<v Tom>We had like a few places where we're doing, you know, kind of like we're doing

00:14:58.990 --> 00:15:04.250
<v Tom>what we could to make sure that Python has a very rich type system.

00:15:04.730 --> 00:15:07.830
<v Tom>So I think we were already doing like a fairly good job in terms of like catching

00:15:07.830 --> 00:15:10.730
<v Tom>a lot of issues at compile time.

00:15:10.830 --> 00:15:15.570
<v Tom>But the problem with Python is that those type, all the hacks that I mentioned

00:15:15.570 --> 00:15:17.350
<v Tom>were not real, right? It's not really a type.

00:15:17.450 --> 00:15:20.190
<v Tom>It's kind of like, you know, an annotation that we added.

00:15:20.350 --> 00:15:22.570
<v Tom>So we still had issues where we got the annotation wrong.

00:15:23.590 --> 00:15:27.850
<v Tom>And to account for that, like what we did is that we had a lot of unit tests,

00:15:28.070 --> 00:15:29.850
<v Tom>but we also had a lot of end-to-end tests.

00:15:30.070 --> 00:15:33.350
<v Tom>That was really the big thing that we did. Like we had a lot of end-to-end tests.

00:15:33.550 --> 00:15:37.670
<v Tom>And the end-to-end test suite like runs on staging, on real production,

00:15:37.810 --> 00:15:41.690
<v Tom>like against real load balance, everything, just to make sure that whatever

00:15:41.690 --> 00:15:45.990
<v Tom>it is that we do just goes, you know, safely goes into production.

00:15:46.890 --> 00:15:50.030
<v Tom>So we were pretty good. Like, I don't think we had a lot of production issues...

00:15:51.220 --> 00:15:55.180
<v Tom>I mean, definitely no serious ones, but it could be that we added a new API

00:15:55.180 --> 00:16:00.080
<v Tom>and that API maybe had an error when it was just introduced in some scenarios.

00:16:01.140 --> 00:16:07.060
<v Tom>So it was fairly good in that regard. But the problem was when you want to refactor,

00:16:07.200 --> 00:16:08.120
<v Tom>when you change something.

00:16:08.700 --> 00:16:13.500
<v Tom>That is the scariest experience you can have as a Python developer because you

00:16:13.500 --> 00:16:16.620
<v Tom>don't have that assurance of if it compiles, it just works.

00:16:17.320 --> 00:16:21.840
<v Tom>And man, finding those and going... Like one of the most annoying things is

00:16:21.840 --> 00:16:24.800
<v Tom>that like, and this used to be a common occurrence, doesn't happen for us anymore,

00:16:25.040 --> 00:16:29.220
<v Tom>is that like staging would be broken because someone merged something and that

00:16:29.220 --> 00:16:30.520
<v Tom>for whatever reason broke something.

00:16:30.820 --> 00:16:34.040
<v Tom>And then you kind of like, oh, I have another fix. And then like someone else

00:16:34.040 --> 00:16:36.460
<v Tom>brings a fix and someone else brings another change. And all of a sudden like

00:16:36.460 --> 00:16:38.600
<v Tom>unwinding those in staging is like a big pain.

00:16:40.280 --> 00:16:46.080
<v Tom>And that just doesn't happen anymore. So I don't think customers fully noticed

00:16:46.080 --> 00:16:50.900
<v Tom>the experience from a stability standpoint, but we definitely did in terms of

00:16:50.900 --> 00:16:54.520
<v Tom>just like how healthy our staging environment was and how big of a pain it was.

00:16:54.720 --> 00:16:59.360
<v Matthias>The one thing that you triggered in me was also this PTSD that I had when refactoring

00:16:59.360 --> 00:17:04.000
<v Matthias>larger Python codebases, where you end up with a...

00:17:04.740 --> 00:17:08.120
<v Matthias>An exception somewhere really down the

00:17:08.120 --> 00:17:10.920
<v Matthias>call stack in production and you have no idea

00:17:10.920 --> 00:17:14.060
<v Matthias>what's happening you have no idea of the state that the application was

00:17:14.060 --> 00:17:17.380
<v Matthias>in and you trigger that maybe once in a

00:17:17.380 --> 00:17:20.660
<v Matthias>million requests or so because it's a very weird combination

00:17:20.660 --> 00:17:28.360
<v Matthias>of conditions so yeah i can totally relate to that although i was wondering

00:17:28.360 --> 00:17:34.540
<v Matthias>shouldn't the type system or the types that you added to the python code have

00:17:34.540 --> 00:17:37.520
<v Matthias>you know saved you from that wasn't that the idea.

00:17:37.520 --> 00:17:40.420
<v Tom>I think the problem is that the python type system is

00:17:40.420 --> 00:17:43.320
<v Tom>just not expressive enough to really i mean maybe you

00:17:43.320 --> 00:17:46.080
<v Tom>know maybe nowadays it's like slightly better but like enums for

00:17:46.080 --> 00:17:48.860
<v Tom>example i don't think i mean maybe now they exist i don't think so

00:17:48.860 --> 00:17:51.660
<v Tom>and it's still not to the same level but it's like enums and being able to

00:17:51.660 --> 00:17:55.320
<v Tom>catch all of those you know all of the all

00:17:55.320 --> 00:17:58.560
<v Tom>of the variants yeah thank you and another thing

00:17:58.560 --> 00:18:01.720
<v Tom>that you said that triggered me is like exceptions oh my

00:18:01.720 --> 00:18:04.760
<v Tom>god i think that is the billion dollar programming mistake like

00:18:04.760 --> 00:18:07.880
<v Tom>i think rust really got it right in how to

00:18:07.880 --> 00:18:10.640
<v Tom>do error propagation which is

00:18:10.640 --> 00:18:13.720
<v Tom>you don't you can't just like willingly throw random exceptions all

00:18:13.720 --> 00:18:16.540
<v Tom>the way up to the stack like you have to catch them at every

00:18:16.540 --> 00:18:20.140
<v Tom>step and define the you know like properly define

00:18:20.140 --> 00:18:22.840
<v Tom>the um the signature and kind of like the

00:18:22.840 --> 00:18:26.660
<v Tom>contract that you have with the caller um i think is extremely important so

00:18:26.660 --> 00:18:30.740
<v Tom>yes python helped a lot but again

00:18:30.740 --> 00:18:34.480
<v Tom>as i said we missed some places like we didn't get the typing exactly correct

00:18:34.480 --> 00:18:38.480
<v Tom>in some areas we didn't the python was expensive enough to really get everything

00:18:38.480 --> 00:18:43.920
<v Tom>that we wanted in others it was scary it wasn't uh just walking the park what

00:18:43.920 --> 00:18:50.480
<v Tom>we trust and we can cover that later like it really we really utilized the type system to

00:18:50.920 --> 00:18:54.640
<v Tom>just make many runtime errors just like impossible.

00:18:56.940 --> 00:19:01.560
<v Matthias>I don't want to put words into your mouth but this other thing that i really

00:19:01.560 --> 00:19:06.420
<v Matthias>don't like about python deployments is that they are large you have large containers.

00:19:06.420 --> 00:19:07.980
<v Tom>Yeah and.

00:19:07.980 --> 00:19:11.560
<v Matthias>In rust you have small binaries in comparison did you.

00:19:11.560 --> 00:19:18.400
<v Tom>Run into that problem okay so we we were using aws lambda for some things back

00:19:18.400 --> 00:19:24.140
<v Tom>in the days and you would hit i'm pretty sure we hit lambda size limits,

00:19:24.640 --> 00:19:27.520
<v Tom>maybe i'm misremembering no i'm pretty sure we hit lambda size

00:19:27.520 --> 00:19:31.860
<v Tom>limits it's like 150 megabyte but even if it wasn't a hard limit i think it

00:19:31.860 --> 00:19:35.360
<v Tom>was you know those were hard limits but even if they weren't i think it was

00:19:35.360 --> 00:19:39.880
<v Tom>just you'd have performance issues maybe or cost i don't remember it was annoying

00:19:39.880 --> 00:19:44.760
<v Tom>though it was really annoying yeah i mean nothing more to add there it was really

00:19:44.760 --> 00:19:46.920
<v Tom>annoying and like and also because there's a runtime,

00:19:48.070 --> 00:19:53.090
<v Tom>part to it like you would have to wait for for lambda for aws to support python

00:19:53.090 --> 00:19:57.970
<v Tom>3.10 before you can upgrade so we still like we were like a few versions behind

00:19:57.970 --> 00:20:03.850
<v Tom>it was really like a big pain by having like the runtime being shipped by someone that's not us okay.

00:20:03.850 --> 00:20:08.730
<v Matthias>So we agree python might not be the best choice for what you wanted to build

00:20:08.730 --> 00:20:13.170
<v Matthias>as fix a production grade service for webhooks,

00:20:14.310 --> 00:20:18.590
<v Matthias>but Rust might not have been the only choice that you might have considered

00:20:18.590 --> 00:20:23.430
<v Matthias>for example what about other languages like Elixir or Golang wouldn't that be an option too.

00:20:23.430 --> 00:20:29.030
<v Tom>Yeah so I can only speak to how it was when we made the decision maybe things

00:20:29.030 --> 00:20:33.350
<v Tom>I mean I know some things have changed in Elixir land and maybe a bit in Go

00:20:33.350 --> 00:20:37.710
<v Tom>so Elixir was even more esoteric than Rust back then and

00:20:37.750 --> 00:20:41.670
<v Tom>like choosing an esoteric language like Rust was already a bit of a gamble.

00:20:41.910 --> 00:20:45.430
<v Tom>So I had to, you know, I had to like hedge my bet to an extent.

00:20:45.550 --> 00:20:48.930
<v Tom>Like I couldn't go, and it's not that I wanted to choose Elixir and that's why

00:20:48.930 --> 00:20:52.170
<v Tom>I didn't choose it, but it was just like off the table just because of that.

00:20:52.730 --> 00:20:57.290
<v Tom>And also I had experience with Rust before and I didn't have any experience with Elixir.

00:20:57.450 --> 00:21:00.570
<v Tom>So, you know, the amount of unknowns was much higher, lower.

00:21:00.790 --> 00:21:02.690
<v Tom>Like I built production system in Rust before that.

00:21:03.390 --> 00:21:06.430
<v Tom>Not web, by the way, web was not ready before Sphix. like

00:21:06.430 --> 00:21:09.450
<v Tom>we were really at the cusp of like bus being web

00:21:09.450 --> 00:21:12.310
<v Tom>ready but yeah i've built like production stuff before

00:21:12.310 --> 00:21:15.850
<v Tom>in rust and then go it just

00:21:15.850 --> 00:21:18.850
<v Tom>and again maybe things have improved but

00:21:18.850 --> 00:21:24.810
<v Tom>the type system is so bare and as i said like i'm all in it's kind of felt like

00:21:24.810 --> 00:21:30.050
<v Tom>going almost backwards from like python like and then really i i mean yeah compilation

00:21:30.050 --> 00:21:33.430
<v Tom>speed amazing like i wish we had it in rust like i can we can probably talk

00:21:33.430 --> 00:21:36.330
<v Tom>about it later like i will complain you know for hours at no end,

00:21:36.470 --> 00:21:39.010
<v Tom>about a rough compilation time. But.

00:21:40.190 --> 00:21:43.490
<v Tom>The Go type system was just not expressive enough to really capture everything

00:21:43.490 --> 00:21:44.770
<v Tom>that I thought should be captured.

00:21:45.250 --> 00:21:50.530
<v Tom>And even without that, we actually wrote the first version of the rewrite of

00:21:50.530 --> 00:21:52.130
<v Tom>Sphix in both Go and Rust.

00:21:52.230 --> 00:21:56.790
<v Tom>And I say like first version, like we spent a few days on the Rust one, a few days on the Go one.

00:21:57.030 --> 00:22:00.010
<v Tom>And like someone, I wrote the Rust one and like someone else from the team that

00:22:00.010 --> 00:22:01.750
<v Tom>knew Go very well wrote the Go one.

00:22:01.930 --> 00:22:07.870
<v Tom>And then as a team, we just looked at both and we just decided like Rust just makes much more sense.

00:22:07.870 --> 00:22:10.690
<v Tom>In terms of like again all the areas that

00:22:10.690 --> 00:22:13.690
<v Tom>we cared about like type type safety but also

00:22:13.690 --> 00:22:16.350
<v Tom>just in terms of like how it is to use the

00:22:16.350 --> 00:22:19.530
<v Tom>language and there are like a few things that are done and go that i know go

00:22:19.530 --> 00:22:24.010
<v Tom>go for his love but i i just can't understand like the implicit implicit imports

00:22:24.010 --> 00:22:29.630
<v Tom>drives me mad like i like not implicit just like non-descriptive imports like

00:22:29.630 --> 00:22:35.210
<v Tom>the capitalization for visibility also is a bit annoying reminds me of like Hungarian notation,

00:22:36.750 --> 00:22:41.930
<v Tom>but i think also like the way to do json parsing if i remember correctly that

00:22:41.930 --> 00:22:45.510
<v Tom>was like i think it's done in comments or something i don't remember i remember

00:22:45.510 --> 00:22:51.750
<v Tom>it was just or not comments like a go specific comment thing annotation or something tags labels.

00:22:51.750 --> 00:22:53.450
<v Matthias>Or whatever i guess tags yeah.

00:22:53.450 --> 00:22:57.130
<v Tom>Yeah again i don't remember it's been a while but i it just essentially we kind

00:22:57.130 --> 00:23:00.930
<v Tom>of like we look at the resulting code and we just okay there is no comparison

00:23:00.930 --> 00:23:05.390
<v Tom>here like we can build something on top of Rust, then I don't think we can build

00:23:05.390 --> 00:23:06.430
<v Tom>the same thing on top of Go.

00:23:06.690 --> 00:23:11.630
<v Matthias>Did that person that wrote the Go prototype agree, or did they challenge the decision?

00:23:12.030 --> 00:23:16.070
<v Tom>You know, so I will admit that I'm the CEO of the company, so there's obviously

00:23:16.070 --> 00:23:19.470
<v Tom>like a bit of a power imbalance there, but from what I remember,

00:23:19.610 --> 00:23:22.510
<v Tom>there was like buying from everyone.

00:23:22.710 --> 00:23:26.710
<v Tom>The only concern was like finding Rust talent, which I shared.

00:23:28.210 --> 00:23:28.690
<v Matthias>And...

00:23:29.700 --> 00:23:35.540
<v Matthias>In this prototype was there already some sort of concurrency did you use you

00:23:35.540 --> 00:23:38.500
<v Matthias>know go functions or so or go

00:23:38.500 --> 00:23:43.440
<v Matthias>routines actually to maybe make some things concurrent or or was it linear.

00:23:43.440 --> 00:23:49.020
<v Tom>No no no so we i mean we built a tiny part of the product but we built it like

00:23:49.020 --> 00:23:53.100
<v Tom>we think it would look like in the end so even you know including like some

00:23:53.100 --> 00:23:56.660
<v Tom>weird scaffolding that we would you know normally you wouldn't want in like

00:23:56.660 --> 00:23:59.500
<v Tom>a first version but it's kind of like okay how how would it look like if we

00:23:59.500 --> 00:24:00.520
<v Tom>actually did this scaffolding.

00:24:00.520 --> 00:24:05.640
<v Matthias>When you built this initial prototype you probably compared go concurrency with

00:24:05.640 --> 00:24:08.440
<v Matthias>rust concurrency what what was the verdict there.

00:24:08.440 --> 00:24:11.440
<v Tom>Yeah i don't think there was

00:24:11.440 --> 00:24:14.840
<v Tom>you know like a a strong comparison there

00:24:14.840 --> 00:24:17.480
<v Tom>in the sense of you know it's like benchmarking or

00:24:17.480 --> 00:24:20.460
<v Tom>stuff like that we really we cared a lot about you know the typing

00:24:20.460 --> 00:24:23.320
<v Tom>and developer experience really what swayed us but we did

00:24:23.320 --> 00:24:26.320
<v Tom>use both right i mean go concurrency is kind of like nice in

00:24:26.320 --> 00:24:29.040
<v Tom>the sense you just like it just happens but i

00:24:29.040 --> 00:24:32.100
<v Tom>think i personally but this is a subjective thing i like

00:24:32.100 --> 00:24:34.880
<v Tom>i like the fact that there's like an await syntax i like

00:24:34.880 --> 00:24:38.280
<v Tom>the fact that we color functions by saying you know what this

00:24:38.280 --> 00:24:41.600
<v Tom>one is you know this one

00:24:41.600 --> 00:24:44.780
<v Tom>can run you know in an async context like

00:24:44.780 --> 00:24:49.480
<v Tom>this one has io this one has whatever i think that's a nice thing but this is

00:24:49.480 --> 00:24:54.440
<v Tom>really just a syntax thing i don't think we had a deep you know kind of any

00:24:54.440 --> 00:24:58.840
<v Tom>anything deep there and i i will say though that we and this is like a bit of

00:24:58.840 --> 00:25:03.900
<v Tom>a tangent we there was like a rust library that was doing magical.

00:25:04.700 --> 00:25:07.420
<v Tom>Io in the background so it was like you

00:25:07.420 --> 00:25:10.460
<v Tom>wouldn't the function wouldn't await but in

00:25:10.460 --> 00:25:13.940
<v Tom>the background it would trigger something that would make changes in the server

00:25:13.940 --> 00:25:18.180
<v Tom>and it's kind of just like again in a background thread or something like again

00:25:18.180 --> 00:25:22.520
<v Tom>like magical even though it wasn't like you didn't call a weight on the function

00:25:22.520 --> 00:25:27.820
<v Tom>and i remember that experience just remind me how much i love the fact that like,

00:25:28.490 --> 00:25:33.470
<v Tom>any io functions you know that io functions are you know have to be awaited because you know,

00:25:34.130 --> 00:25:37.910
<v Tom>you know you can easily look at the signature and know that something affects

00:25:37.910 --> 00:25:39.050
<v Tom>the server or if something doesn't

00:25:39.050 --> 00:25:41.670
<v Tom>affect the server is this like a local configuration or is this like a

00:25:42.110 --> 00:25:45.610
<v Tom>a remote configuration i know this is like an esoteric example but it really

00:25:45.610 --> 00:25:49.190
<v Tom>just we spent so much time debugging this because we were so confused by the

00:25:49.190 --> 00:25:54.730
<v Tom>fact that this non-async call was actually you know touching the server no.

00:25:54.730 --> 00:25:57.450
<v Matthias>I can relate to that because a lot of people

00:25:57.450 --> 00:26:01.190
<v Matthias>say function coloring is an issue but it

00:26:01.190 --> 00:26:03.850
<v Matthias>feels like what you allude to

00:26:03.850 --> 00:26:06.530
<v Matthias>is more of more the opposite where you say

00:26:06.530 --> 00:26:10.790
<v Matthias>you want to know if something is async you want to be explicit about it and

00:26:10.790 --> 00:26:16.350
<v Matthias>if you structure your code differently maybe it's not going to be a huge problem

00:26:16.350 --> 00:26:20.030
<v Matthias>because you know that this part of the application has side effects and does

00:26:20.030 --> 00:26:24.250
<v Matthias>ao in an asynchronous way and the other part does not it's.

00:26:24.250 --> 00:26:27.370
<v Tom>Interesting it's kind of like you use the word you know like function coloring

00:26:27.370 --> 00:26:30.090
<v Tom>right and i think that's a great example like you

00:26:30.090 --> 00:26:34.610
<v Tom>know we color we have syntax highlighting right in our code in our code we want

00:26:34.610 --> 00:26:39.370
<v Tom>to highlight things that behave differently than things than other things and

00:26:39.370 --> 00:26:44.590
<v Tom>especially things as important as you know i mean potentially if we don't even

00:26:44.590 --> 00:26:47.690
<v Tom>have like asking away like just blocking operations which is like i don't know

00:26:47.690 --> 00:26:49.250
<v Tom>how anyone can advocate for like,

00:26:49.930 --> 00:26:55.450
<v Tom>not having a way to know what our function is blocking like that yeah just beyond

00:26:55.450 --> 00:26:58.330
<v Tom>me again for blocking operations yeah.

00:26:58.330 --> 00:27:02.310
<v Matthias>Okay so programming other

00:27:02.310 --> 00:27:05.170
<v Matthias>programming languages were ruled out we talked about

00:27:05.170 --> 00:27:08.170
<v Matthias>golang a little bit and elixir was just

00:27:08.170 --> 00:27:10.930
<v Matthias>too cutting edge back in the day so it does

00:27:10.930 --> 00:27:14.010
<v Matthias>make sense to take a closer look at rust but

00:27:14.010 --> 00:27:17.490
<v Matthias>the other option that i had in mind at least maybe you discussed it internally

00:27:17.490 --> 00:27:25.250
<v Matthias>was maybe we just rewrite parts of the application in rust maybe by using language

00:27:25.250 --> 00:27:31.170
<v Matthias>bindings with py03 and then you rewrite parts of the critical stuff and you

00:27:31.170 --> 00:27:33.810
<v Matthias>keep the rest did you consider that option.

00:27:33.810 --> 00:27:40.470
<v Tom>Yeah i mean i think for us the the kind of like two parts that cause latency

00:27:40.470 --> 00:27:44.210
<v Tom>or you know memory usage all of that like the performance either lies in the

00:27:44.210 --> 00:27:48.130
<v Tom>database queue or you know other like background systems that we use,

00:27:48.810 --> 00:27:52.430
<v Tom>or in the json serialization aspect and

00:27:52.430 --> 00:27:57.770
<v Tom>all of that so there isn't really a lot of code for us that we could out so

00:27:57.770 --> 00:28:01.950
<v Tom>there's not like an like an ai operation that you can just like okay i'm gonna

00:28:01.950 --> 00:28:06.510
<v Tom>or you know like a numpy thing like you know some like heavy scientific operation

00:28:06.510 --> 00:28:10.130
<v Tom>you can just do rewrite in c or in rust and we'll just be faster and everything

00:28:10.130 --> 00:28:11.730
<v Tom>else can just stay in python for us,

00:28:12.430 --> 00:28:15.330
<v Tom>the the the wrapping is the code like the

00:28:15.330 --> 00:28:18.490
<v Tom>you know like how quickly we can respond you know how you know

00:28:18.490 --> 00:28:21.390
<v Tom>the latency when we respond to http calls that is something that

00:28:21.390 --> 00:28:25.370
<v Tom>matters to us like how much memory use you know we use when we respond to an

00:28:25.370 --> 00:28:29.790
<v Tom>http call that's what matters so the shell was as important as the code itself

00:28:29.790 --> 00:28:33.950
<v Tom>so we we didn't have anything obvious or anywhere obvious to make that yeah

00:28:33.950 --> 00:28:37.410
<v Tom>and that change especially since by the way like all the serialization costs

00:28:37.410 --> 00:28:38.790
<v Tom>are just as significant so,

00:28:39.420 --> 00:28:42.900
<v Tom>It just didn't make sense. And, you know, actually another thing that comes

00:28:42.900 --> 00:28:45.220
<v Tom>to mind, we talked about, you know, the typing as well.

00:28:45.780 --> 00:28:48.340
<v Tom>That's another place to lose type. Like really there was no,

00:28:48.560 --> 00:28:51.560
<v Tom>we didn't even check that option. Like we just decided to commit.

00:28:52.020 --> 00:28:57.080
<v Matthias>Yeah, because really what you build is a platform. It's sort of runtime for

00:28:57.080 --> 00:29:00.020
<v Matthias>web things like webhooks, for example.

00:29:00.840 --> 00:29:05.480
<v Matthias>And yeah, I see that you also need to handle a lot of JSON. and then if you

00:29:05.480 --> 00:29:09.920
<v Matthias>transform a lot of JSON objects from Python to Rust and back,

00:29:10.060 --> 00:29:12.160
<v Matthias>that also incurs memory overhead.

00:29:13.300 --> 00:29:18.920
<v Matthias>Okay, that means we finally arrived at the final decision to rewrite the application

00:29:18.920 --> 00:29:23.180
<v Matthias>in Rust, but it probably wasn't an easy path.

00:29:23.460 --> 00:29:27.740
<v Matthias>It wasn't an easy migration. Can you talk a little bit about the learnings from

00:29:27.740 --> 00:29:30.080
<v Matthias>migrating from Python to Rust?

00:29:30.500 --> 00:29:33.260
<v Tom>Yeah, I think, you know, like having making

00:29:33.260 --> 00:29:36.440
<v Tom>sure that the database types are the

00:29:36.440 --> 00:29:39.340
<v Tom>same wasn't that hard but still wasn't like

00:29:39.340 --> 00:29:41.960
<v Tom>super easy and and then just you know making sure that the

00:29:41.960 --> 00:29:45.140
<v Tom>ready serialization is identical because this one did it

00:29:45.140 --> 00:29:48.020
<v Tom>slightly different than the other one or maybe we you know

00:29:48.020 --> 00:29:51.620
<v Tom>we messed up something in python it was actually like named you

00:29:51.620 --> 00:29:54.360
<v Tom>know slightly in a weird manner that's inconsistent whether it rust we just

00:29:54.360 --> 00:29:57.100
<v Tom>kind of like did you know camel case everything and then in python we

00:29:57.100 --> 00:30:00.420
<v Tom>had to you know any whatever we had to like create weird aliases

00:30:00.420 --> 00:30:03.140
<v Tom>for like some of the cache stuff but i

00:30:03.140 --> 00:30:06.200
<v Tom>think other than that and other than the difficulties of just like a rewrite

00:30:06.200 --> 00:30:10.240
<v Tom>with zero downtime i think it went fairly smoothly and we're

00:30:10.240 --> 00:30:13.320
<v Tom>super okay first of all i'm a big fan of rust i'll i'll

00:30:13.320 --> 00:30:16.840
<v Tom>you know stand on the roof and shout it to anyone for anyone to hear but we

00:30:16.840 --> 00:30:19.620
<v Tom>did have some areas that were

00:30:19.620 --> 00:30:23.940
<v Tom>unexpected that where rust wasn't perfect actually and there were the areas

00:30:23.940 --> 00:30:26.760
<v Tom>that we expected like compilation i mean compilation app actually was worse

00:30:26.760 --> 00:30:30.500
<v Tom>than what we expected and there were other areas like finding engineers that

00:30:30.500 --> 00:30:34.680
<v Tom>was a bit of a challenge back then nowadays you know there are like a lot of

00:30:34.680 --> 00:30:36.980
<v Tom>rust engineers but there were actual,

00:30:37.760 --> 00:30:42.380
<v Tom>areas where rust or python i think i don't know if better is the right word

00:30:42.380 --> 00:30:46.980
<v Tom>but like there were just like less foot guns i guess i'm happy to elaborate on those if you want.

00:30:48.320 --> 00:30:54.060
<v Matthias>Yeah please in your blog post you mentioned a few things like heap fragmentation uh,

00:30:54.810 --> 00:30:59.870
<v Matthias>and memory efficiency maybe these would be a few things that.

00:30:59.870 --> 00:31:05.210
<v Tom>Yeah i think kind of like you know like if you summarize all of those in like

00:31:05.210 --> 00:31:08.430
<v Tom>in one you know kind of sentence like with great power comes great responsibility

00:31:08.430 --> 00:31:12.050
<v Tom>so kind of like you have more control now and so that means you have to be more

00:31:12.050 --> 00:31:15.590
<v Tom>careful about things the one example that you mentioned was like the heap fragmentation,

00:31:16.070 --> 00:31:19.210
<v Tom>so rust gives you more control there and

00:31:19.210 --> 00:31:23.030
<v Tom>i guess for whatever it just uses the default system allocator

00:31:23.030 --> 00:31:26.290
<v Tom>by default which just didn't work

00:31:26.290 --> 00:31:29.410
<v Tom>very well when it comes to fragmentation so like a long running process

00:31:29.410 --> 00:31:32.790
<v Tom>so heap fragmentation for those who don't know is essentially

00:31:32.790 --> 00:31:35.610
<v Tom>when you allocate big chunks or smaller chunks and then

00:31:35.610 --> 00:31:38.530
<v Tom>you can like every allocation just adds more to the memory

00:31:38.530 --> 00:31:41.730
<v Tom>instead of like reusing those chunks because they don't fit you know let's assume

00:31:41.730 --> 00:31:46.150
<v Tom>you you allocated a chunk of like 90 bytes then you freed it and then you want

00:31:46.150 --> 00:31:50.150
<v Tom>to allocate another chunk of like 95 bytes you can't reuse that one so you're

00:31:50.150 --> 00:31:53.650
<v Tom>going to get another chunk of 95 bytes then maybe you're going to reuse it for

00:31:53.650 --> 00:31:57.910
<v Tom>half of that smaller chunk but then you're not going to be able to you know another 70.

00:31:58.410 --> 00:32:01.230
<v Tom>Byte chunk would again would have to go on top and essentially what happens is

00:32:01.230 --> 00:32:04.470
<v Tom>that your memory just grows and grows and grows even though

00:32:04.470 --> 00:32:07.210
<v Tom>in practice you're actually not using that much

00:32:07.210 --> 00:32:09.910
<v Tom>memory using the same amount of memory but just

00:32:09.910 --> 00:32:12.670
<v Tom>because you allocate and deallocate in different sizes all the time it

00:32:12.670 --> 00:32:15.690
<v Tom>just causes fragmentation this was

00:32:15.690 --> 00:32:18.910
<v Tom>actually fixed by just switching to jmalloc like

00:32:18.910 --> 00:32:24.290
<v Tom>just changing the system allocated to one that you know is very good and really

00:32:24.290 --> 00:32:28.710
<v Tom>did wonders for us but the fact that we had to think about memory layouts and

00:32:28.710 --> 00:32:32.530
<v Tom>investigate like memory usage you know it's kind of like it just gives an example

00:32:32.530 --> 00:32:38.270
<v Tom>of like some of the stuff that you now have to worry about when you start using rust it's.

00:32:38.270 --> 00:32:44.550
<v Matthias>Funny because the rust compiler itself or the Rust ecosystem moved away from

00:32:44.550 --> 00:32:50.610
<v Matthias>Jamalock to the system allocator, but apparently for some use cases that isn't

00:32:50.610 --> 00:32:52.710
<v Matthias>the best choice, the system allocator it is.

00:32:53.810 --> 00:32:55.970
<v Tom>Yeah, for us it definitely wasn't, yeah.

00:32:56.850 --> 00:32:57.150
<v Matthias>But,

00:32:58.110 --> 00:33:04.770
<v Matthias>where would you notice heap fragmentation was it a very occurring a reoccurring

00:33:04.770 --> 00:33:09.070
<v Matthias>pattern that you could have solved with for example a bump allocator or an arena

00:33:09.070 --> 00:33:11.790
<v Matthias>allocator was it expected,

00:33:13.010 --> 00:33:18.150
<v Matthias>load of small objects that you allocated at once and could de-allocate at once.

00:33:18.150 --> 00:33:22.770
<v Tom>Yeah so i think for us i don't you know it's it was also like a year or something

00:33:22.770 --> 00:33:27.670
<v Tom>ago but like i think for us it was a lot of parsing json and the json would

00:33:27.670 --> 00:33:32.230
<v Tom>come from customers and we would send their json for them so that means you

00:33:32.230 --> 00:33:35.190
<v Tom>know because like it's a webhook so that means that that json can have

00:33:35.610 --> 00:33:40.610
<v Tom>you know for you know can be of varying sizes and can be and would normally be very large as well,

00:33:41.230 --> 00:33:44.170
<v Tom>i don't know we probably could have you

00:33:44.170 --> 00:33:46.930
<v Tom>know made so do you use a different allocate or something like that

00:33:46.930 --> 00:33:49.670
<v Tom>but it's just like changing the the default allocator would just

00:33:49.670 --> 00:33:52.310
<v Tom>you know solve everything i think that was probably the main

00:33:52.310 --> 00:33:55.710
<v Tom>thing another thing is you know we use some aws libraries and they

00:33:55.710 --> 00:34:01.270
<v Tom>i i believe they allocate we make http calls and that i believe allocates when

00:34:01.270 --> 00:34:05.050
<v Tom>it kind of creates generates the body that we send and so there were just like

00:34:05.050 --> 00:34:12.330
<v Tom>a few areas where we were allocating large arbitrary sized structures all the time yeah did.

00:34:12.330 --> 00:34:15.770
<v Matthias>You notice a difference in memory usage when you switched the allocator.

00:34:15.770 --> 00:34:20.050
<v Tom>Oh yeah so before that it was i guess you can't see me it's only audio only

00:34:20.050 --> 00:34:24.010
<v Tom>but like you would see you know you'd see like a bump i guess because of some

00:34:24.010 --> 00:34:28.190
<v Tom>you know like high loads that would cause a lot of fragmentation and then that

00:34:28.190 --> 00:34:31.570
<v Tom>bump would just like stay flat and then there would be another bump and then

00:34:31.570 --> 00:34:32.750
<v Tom>that would stay flat as well,

00:34:33.550 --> 00:34:36.030
<v Tom>with jmadoc you immediately see just like,

00:34:36.730 --> 00:34:40.450
<v Tom>First of all, it becomes more jagged instead of flat. You'd see the graph of

00:34:40.450 --> 00:34:41.850
<v Tom>memory usage becomes way more jagged.

00:34:42.350 --> 00:34:48.570
<v Tom>And the low watermark was much lower.

00:34:49.750 --> 00:34:52.310
<v Tom>The baseline was significantly lower.

00:34:53.670 --> 00:34:57.230
<v Tom>We can post the link to that blog post as well in the show notes.

00:34:57.390 --> 00:35:00.450
<v Tom>You'll see there's a few graphs there. You can really see it. It's very obvious.

00:35:01.810 --> 00:35:08.090
<v Matthias>Was memory usage even a concern for you? even if you used more memory would you run out of that.

00:35:08.090 --> 00:35:11.990
<v Tom>We did that's how we found out okay and

00:35:11.990 --> 00:35:14.770
<v Tom>again not immediately but just like every now and then like you

00:35:14.770 --> 00:35:17.330
<v Tom>get an oom and you're like what the hell is going on you know

00:35:17.330 --> 00:35:21.150
<v Tom>we see our usage we like we have 95 buffer

00:35:21.150 --> 00:35:24.410
<v Tom>like what's going on but we then we look at some of those and we

00:35:24.410 --> 00:35:27.350
<v Tom>see that there's like a memory leak but nothing is leaking right

00:35:27.350 --> 00:35:30.530
<v Tom>we did like a run valgrind we ran everything like nothing was leaking and then

00:35:30.530 --> 00:35:35.310
<v Tom>that's kind of how we discovered a few actually more things about like memory

00:35:35.310 --> 00:35:39.810
<v Tom>usage that were surprising again like you'd think russ is much more memory efficient

00:35:39.810 --> 00:35:43.350
<v Tom>than python right i mean like if you ask anyone on reddit or twitter or whatever

00:35:43.350 --> 00:35:44.650
<v Tom>they would they would tell you exactly that,

00:35:45.450 --> 00:35:51.230
<v Tom>but one thing was with 30 json like when you were passing generic json values

00:35:51.230 --> 00:35:56.390
<v Tom>so just you know like essentially again a dict but like a json dict so like

00:35:56.390 --> 00:36:00.250
<v Tom>not a not a string string but more just like generic, again, generic JSON,

00:36:00.530 --> 00:36:05.770
<v Tom>that would, for whatever reason, in serde, that causes like an explosion in memory. And we didn't,

00:36:06.370 --> 00:36:11.590
<v Tom>We had no idea. And I think, you know, like in a language like Python,

00:36:12.390 --> 00:36:16.070
<v Tom>you would think that just the generic case, or, you know, you think,

00:36:16.110 --> 00:36:19.190
<v Tom>I think the generic case just kind of like works, like because they optimize

00:36:19.190 --> 00:36:21.430
<v Tom>for that, they think about that, like, well, in Rust,

00:36:22.110 --> 00:36:27.410
<v Tom>you know, like using an untyped structure, that is such an, you know, such an odd case.

00:36:27.570 --> 00:36:30.930
<v Tom>And we were just like surprised by how bad that was. And maybe this is,

00:36:30.970 --> 00:36:32.410
<v Tom>by the way, was the reason for the.

00:36:33.350 --> 00:36:36.390
<v Tom>For the, well one of the reasons for the you know

00:36:36.390 --> 00:36:39.630
<v Tom>fragmentation that we're talking about i'm not sure and there

00:36:39.630 --> 00:36:42.310
<v Tom>are solutions so you had to know that you don't need you're not supposed to

00:36:42.310 --> 00:36:45.110
<v Tom>use json value you need to use json raw value that

00:36:45.110 --> 00:36:47.970
<v Tom>just like treats the string it doesn't power set but like again in

00:36:47.970 --> 00:36:52.070
<v Tom>a in a world you know in python stuff like that would just happen magically

00:36:52.070 --> 00:36:55.890
<v Tom>in the background you know kind of like copy and why and parse on access and

00:36:55.890 --> 00:36:59.550
<v Tom>all of those kind of things and i think with rust everything is much more rigid

00:36:59.550 --> 00:37:06.870
<v Tom>like what you ask for is what you get and it just like was like a funny gotcha that we that we got does.

00:37:06.870 --> 00:37:09.070
<v Matthias>Python deserialize lazily.

00:37:10.410 --> 00:37:15.570
<v Tom>Probably not but there are things that i'm sure it does like to be efficient

00:37:15.570 --> 00:37:18.690
<v Tom>but maybe it does by the way i really have no idea well.

00:37:18.690 --> 00:37:25.790
<v Matthias>Especially if you share those large json objects then i can see where the efficiency comes from because.

00:37:25.790 --> 00:37:26.990
<v Tom>In rust.

00:37:26.990 --> 00:37:33.070
<v Matthias>You would generally clone unless you put it behind an arc but in python everything

00:37:33.070 --> 00:37:36.930
<v Matthias>is ref counted so you get cheap copies for free.

00:37:37.750 --> 00:37:40.830
<v Tom>That's another like example of where you know

00:37:40.830 --> 00:37:43.610
<v Tom>memory use can just just go out of the window because

00:37:43.610 --> 00:37:46.270
<v Tom>as you said in like in python you don't really care about you know kind of

00:37:46.270 --> 00:37:50.150
<v Tom>like what was it like fearless concurrency is that the yeah

00:37:50.150 --> 00:37:53.010
<v Tom>you don't really care about that like yeah of course space conditions that's

00:37:53.010 --> 00:37:55.830
<v Tom>just a fact of life in python so well i guess maybe

00:37:55.830 --> 00:37:59.830
<v Tom>people just don't use threading as much and so

00:37:59.830 --> 00:38:02.930
<v Tom>they just as you said pass like ref counts all over

00:38:02.930 --> 00:38:05.750
<v Tom>like in rust like you would clone so much data

00:38:05.750 --> 00:38:08.970
<v Tom>if you're not careful like you would accidentally clone that json structure

00:38:08.970 --> 00:38:12.850
<v Tom>like that heavy when you pass it to another function and you can accidentally

00:38:12.850 --> 00:38:18.750
<v Tom>clone something else and it's very easy to accidentally use much more memory

00:38:18.750 --> 00:38:22.870
<v Tom>than you would use in python just because of the you know like the implicit

00:38:22.870 --> 00:38:27.030
<v Tom>ref counting and just like passing of references instead of cloning yeah.

00:38:27.030 --> 00:38:32.970
<v Matthias>I always wanted a lazy json library in rust and probably it exists but i couldn't

00:38:32.970 --> 00:38:38.210
<v Matthias>bother looking it up um but it would be so nice to have a system that does not

00:38:38.210 --> 00:38:43.070
<v Matthias>create 30 json values all the time but it more or less is a view into the data

00:38:43.070 --> 00:38:46.310
<v Matthias>and then it only deserializes what you actually want to look at.

00:38:46.310 --> 00:38:52.070
<v Tom>Yeah i think so kind of the 30 row value kind of semi lets you do that by the

00:38:52.070 --> 00:38:56.230
<v Tom>fact that it doesn't serialize that part and then you can maybe explicitly deserialize that part.

00:38:56.870 --> 00:39:01.530
<v Tom>I mean, again, you have to build the scaffolding yourself, but it is semi-possible.

00:39:02.090 --> 00:39:03.530
<v Tom>Or at least that's what we're doing now.

00:39:04.620 --> 00:39:09.160
<v Matthias>True. And for you, the serde interface is probably, I'm assuming,

00:39:09.420 --> 00:39:14.640
<v Matthias>one of the most important parts of what you use in the Rust ecosystem.

00:39:15.000 --> 00:39:20.340
<v Matthias>What else do you use very heavily? What do you depend on heavily? Which creates?

00:39:20.880 --> 00:39:26.480
<v Tom>So serde would be one. Axum, we use SeaORM as our ORM.

00:39:26.600 --> 00:39:29.140
<v Tom>I mean, mainly for the query building, less for ORM stuff.

00:39:29.960 --> 00:39:33.340
<v Tom>We use redis rust we use

00:39:33.340 --> 00:39:36.980
<v Tom>aid for open telemetry generation i think

00:39:36.980 --> 00:39:40.340
<v Tom>those would be the main ones and actually i mean you

00:39:40.340 --> 00:39:43.520
<v Tom>know like this is a good segue for like ecosystem maturity

00:39:43.520 --> 00:39:46.680
<v Tom>because we you know

00:39:46.680 --> 00:39:49.900
<v Tom>we have the maintainer of one of the maintainers of axiom on the team we have

00:39:49.900 --> 00:39:53.060
<v Tom>one of the maintainers of redis rust on the team we have one of the maintainers

00:39:53.060 --> 00:39:59.260
<v Tom>of aid on the team and it's not you know i wish we didn't right i mean i'm happy

00:39:59.260 --> 00:40:02.280
<v Tom>to have those people they're great and like they're very you know you know they're

00:40:02.280 --> 00:40:05.400
<v Tom>smart and capable but like we we didn't actually have to

00:40:05.820 --> 00:40:11.060
<v Tom>we have to spend time and effort on these you know like core libraries that

00:40:11.060 --> 00:40:14.180
<v Tom>we depend on like we have to spend time and effort on the ecosystem to make

00:40:14.180 --> 00:40:15.920
<v Tom>sure that it's where we need it to be,

00:40:16.400 --> 00:40:20.460
<v Tom>which is a bit i mean i think it's less less so nowadays and i guess if you're

00:40:20.460 --> 00:40:24.340
<v Tom>starting a swix competitor you have us to maintain this for you so like you

00:40:24.340 --> 00:40:31.160
<v Tom>don't have to incur the same cost but but it was a challenge and it definitely is still is but.

00:40:31.160 --> 00:40:33.480
<v Matthias>On the other side you need to have some skin in the game.

00:40:33.480 --> 00:40:38.540
<v Tom>Yeah again no no not complaining really not complaining but you know if we had

00:40:38.540 --> 00:40:43.780
<v Tom>the alternative which would be like having the perfect redis library just existing

00:40:43.780 --> 00:40:47.140
<v Tom>and we didn't have to you know spend so much time on it that would have been

00:40:47.140 --> 00:40:50.820
<v Tom>better but again no complaints we're happy that it's well maintained by people

00:40:50.820 --> 00:40:52.700
<v Tom>that are well meaning and are capable so.

00:40:53.400 --> 00:40:58.420
<v Matthias>What were some of the missing features or issues that you had with the existing redis libraries.

00:40:59.800 --> 00:41:04.700
<v Tom>So when it's james for my team when he took over the maintainership it was essentially,

00:41:05.920 --> 00:41:08.680
<v Tom>abandoned i mean i don't know if like abandoned is the right word but it wasn't like

00:41:08.680 --> 00:41:11.620
<v Tom>being released or maintained or i guess for

00:41:11.620 --> 00:41:14.860
<v Tom>all intents and purposes abandoned it had no

00:41:14.860 --> 00:41:17.900
<v Tom>support for redis cluster just like didn't support that

00:41:17.900 --> 00:41:20.620
<v Tom>async the async version of

00:41:20.620 --> 00:41:23.480
<v Tom>the lib was fairly broken if

00:41:23.480 --> 00:41:27.320
<v Tom>i remember correctly oh sorry it had support for for blocking

00:41:27.320 --> 00:41:30.900
<v Tom>redis cluster and blocking redis and that was fairly okay in the lib not perfect

00:41:30.900 --> 00:41:36.200
<v Tom>but the async variants for either which is the async variant for the for the

00:41:36.200 --> 00:41:42.340
<v Tom>non-cluster version was limited and the async cluster version was like an external

00:41:42.340 --> 00:41:44.260
<v Tom>library by someone else that's also unmaintained.

00:41:45.020 --> 00:41:50.220
<v Tom>And James merged them together and really streamlined the library and made everything better.

00:41:50.680 --> 00:41:55.240
<v Matthias>Well, that's nice to hear because it benefits a wider network,

00:41:55.720 --> 00:42:02.560
<v Matthias>array of people of potential users um any similar story for any other dependency

00:42:02.560 --> 00:42:07.800
<v Matthias>you mentioned that you have your own open api client i guess but there's drop

00:42:07.800 --> 00:42:11.360
<v Matthias>shot from oxide computer i wonder

00:42:11.360 --> 00:42:15.280
<v Matthias>if if you looked at that or what was the reason for building your own.

00:42:15.280 --> 00:42:18.320
<v Tom>Yeah so i don't remember what the drop shot is

00:42:18.320 --> 00:42:22.360
<v Tom>i'll check it out again after this but aid essentially

00:42:22.360 --> 00:42:25.400
<v Tom>is a tool to automatically generate an open api spec

00:42:25.400 --> 00:42:28.260
<v Tom>from your axiom code so you

00:42:28.260 --> 00:42:31.000
<v Tom>define you know like what you define the the

00:42:31.000 --> 00:42:33.740
<v Tom>api routes what you accept what you send and we kind

00:42:33.740 --> 00:42:37.180
<v Tom>of like we automatically generate it automatically generates the open api spec

00:42:37.180 --> 00:42:40.140
<v Tom>from that it was great we looked at multiple

00:42:40.140 --> 00:42:43.240
<v Tom>options back when we adopted it it was

00:42:43.240 --> 00:42:46.220
<v Tom>by far the best one and unfortunately it went unmaintained a

00:42:46.220 --> 00:42:48.980
<v Tom>few months ago or even longer than that and and we kind

00:42:48.980 --> 00:42:51.860
<v Tom>of like had to take over because you know we had fixes for upstream that

00:42:51.860 --> 00:42:54.620
<v Tom>just weren't being merged and even you know i

00:42:54.620 --> 00:42:57.480
<v Tom>don't remember how much how long we waited but we waited a long time for one

00:42:57.480 --> 00:43:02.040
<v Tom>of the fixes to be merged but then no release was being made so that also was

00:43:02.040 --> 00:43:07.660
<v Tom>fairly useless in that regard so yeah now we kind of like we were you know we've

00:43:07.660 --> 00:43:12.840
<v Tom>been merging prs and like making releases and yeah happy about it yeah.

00:43:12.840 --> 00:43:19.080
<v Matthias>Then again these dependencies make sense to be owned by a company like yours

00:43:19.080 --> 00:43:24.080
<v Matthias>because they are so central to what you do they are central to the web ecosystem

00:43:24.080 --> 00:43:30.740
<v Matthias>and you know all of what you described sounds reasonable at least from an outsider's perspective.

00:43:30.740 --> 00:43:32.700
<v Tom>Yeah and also thanks for.

00:43:32.700 --> 00:43:33.660
<v Matthias>Maintaining by the way.

00:43:33.660 --> 00:43:36.520
<v Tom>I don't know with pleasure and you know we actually we even

00:43:36.520 --> 00:43:39.920
<v Tom>had to create a new library so like we created a library

00:43:39.920 --> 00:43:42.760
<v Tom>for kcuids that just didn't

00:43:42.760 --> 00:43:45.600
<v Tom>exist we built one actually we also wrote one

00:43:45.600 --> 00:43:48.580
<v Tom>in python that's fairly popular but we did

00:43:48.580 --> 00:43:51.620
<v Tom>that we also built or we created the omni queue

00:43:51.620 --> 00:43:54.820
<v Tom>so essentially you know like our open source project supports a

00:43:54.820 --> 00:43:58.280
<v Tom>variety of queues in the background so like redis sqs you

00:43:58.280 --> 00:44:01.180
<v Tom>know just because it needs a queue and i remember like

00:44:01.180 --> 00:44:05.120
<v Tom>i remember celery from python very fondly it's like a library that just lets

00:44:05.120 --> 00:44:09.120
<v Tom>you you know it just like abstracts away the queue back end and then people

00:44:09.120 --> 00:44:12.900
<v Tom>that use your library could just choose their own queue of choice so we kind

00:44:12.900 --> 00:44:18.660
<v Tom>of like we built the equivalent of that for rust yeah it was also also great now.

00:44:18.660 --> 00:44:22.420
<v Matthias>Let's talk about production rust okay

00:44:22.420 --> 00:44:29.240
<v Matthias>you have that thing in production i have two main questions how does it work

00:44:29.240 --> 00:44:36.680
<v Matthias>on the load and how do you avoid breaking changes when when you might when you

00:44:36.680 --> 00:44:39.940
<v Matthias>maintain a larger application in Rust.

00:44:40.740 --> 00:44:45.540
<v Tom>We came from Python, and Python, let's say, is not notorious for being fast.

00:44:45.780 --> 00:44:47.240
<v Tom>And I think it's much better nowadays.

00:44:47.640 --> 00:44:52.820
<v Tom>But back, I think it was like Python 3.7 or 3.8 when we switched.

00:44:53.220 --> 00:44:56.740
<v Tom>I think it still has the global interpreter lock, or maybe just the just release

00:44:56.740 --> 00:44:57.960
<v Tom>a version without. I don't know.

00:44:58.400 --> 00:45:04.040
<v Tom>But the point is, with Python, you run the way to scale up.

00:45:04.300 --> 00:45:04.680
<v Matthias>Multi-threading, yeah.

00:45:05.200 --> 00:45:07.860
<v Tom>But multi-process as well, because of the global interpreter lock.

00:45:07.960 --> 00:45:12.240
<v Tom>So you would run multiple threads and multiple processes. in order to use processes are.

00:45:12.240 --> 00:45:17.380
<v Matthias>Probably fine because those are separate scopes of memory but if you have.

00:45:17.380 --> 00:45:18.000
<v Tom>Multiple threads.

00:45:18.000 --> 00:45:21.980
<v Matthias>In the same python interpreter then you need the global interpreter log or you

00:45:21.980 --> 00:45:23.260
<v Matthias>have the new interpreter.

00:45:23.260 --> 00:45:29.160
<v Tom>What's it called yeah yeah yeah yeah but the problem in multi-process yeah it

00:45:29.160 --> 00:45:32.580
<v Tom>doesn't have the jail limitations that's why they recommend it but it you can

00:45:32.580 --> 00:45:36.680
<v Tom>share memory you like connection pools are separate like for like to the database

00:45:36.680 --> 00:45:40.740
<v Tom>like So actually it comes with its own challenges and just not having to worry

00:45:40.740 --> 00:45:42.820
<v Tom>about any of that, first of all, was fun. It was great.

00:45:43.100 --> 00:45:48.620
<v Tom>But also we've just seen, and it could be when you rewrite, you probably optimize

00:45:48.620 --> 00:45:49.820
<v Tom>things a bit and all of that.

00:45:49.900 --> 00:45:53.620
<v Tom>You know the system better. But we've seen all of Magnitude's improvement in memory usage.

00:45:54.680 --> 00:45:58.400
<v Tom>Latency was, I mean, just like shot down significantly.

00:45:58.600 --> 00:46:02.280
<v Tom>And again, we changed some infrastructure as we were doing, so it could be semi-related

00:46:02.280 --> 00:46:05.860
<v Tom>to that as well. But just every metric improved drastically.

00:46:06.900 --> 00:46:12.460
<v Tom>I think it was, I don't remember the exact numbers, but I think it was probably

00:46:12.460 --> 00:46:16.480
<v Tom>40x or something, the difference between the amount of Python runners that we

00:46:16.480 --> 00:46:18.280
<v Tom>needed to run compared to the Rust ones.

00:46:19.560 --> 00:46:22.580
<v Tom>And the thing about that is that it actually compounds.

00:46:22.920 --> 00:46:27.800
<v Tom>So if you go from, let's even go for the basic case of just like two Rust runners

00:46:27.800 --> 00:46:32.100
<v Tom>and like 80 Python ones, all of a sudden, like the two Rust runners,

00:46:32.320 --> 00:46:37.480
<v Tom>they hit in-memory cache like all the time because they're just two of them, right?

00:46:37.520 --> 00:46:41.220
<v Tom>So let's say 50% of the time, but probably more than 50% because like the moment

00:46:41.220 --> 00:46:43.980
<v Tom>they flip, it gets to the other one, that one now has it cached.

00:46:44.300 --> 00:46:47.760
<v Tom>So like in-memory caching is actually extremely effective all of a sudden.

00:46:47.880 --> 00:46:52.600
<v Tom>We have more than two, but still not way more than two i mean more way more

00:46:52.600 --> 00:46:57.340
<v Tom>than two but not way more to the point as like not effective anymore and with

00:46:57.340 --> 00:47:01.260
<v Tom>python we just we couldn't do effective in-memory caching like you know connection

00:47:01.260 --> 00:47:04.900
<v Tom>database connection pools were just not being reused as much like so,

00:47:05.780 --> 00:47:08.800
<v Tom>switching to us actually significantly reduced our database load

00:47:08.800 --> 00:47:11.740
<v Tom>our redis load like all the

00:47:11.740 --> 00:47:15.940
<v Tom>loads and all of the upstream system you would never think um like you'd really

00:47:15.940 --> 00:47:20.300
<v Tom>never think that this would be i mean i guess you would think but it really

00:47:20.300 --> 00:47:26.100
<v Tom>the downstream effects of just being able to run less processes was just you

00:47:26.100 --> 00:47:29.760
<v Tom>know we're just we're just insane for us so we're very happy about that no.

00:47:29.760 --> 00:47:35.120
<v Matthias>One really thinks about cache locality when you think about reducing the number

00:47:35.120 --> 00:47:39.380
<v Matthias>of nodes you have but it's so vital at a certain point.

00:47:39.380 --> 00:47:45.780
<v Tom>Yeah i think again like i don't have the exact numbers but like the calls to redis which is like,

00:47:46.520 --> 00:47:50.740
<v Tom>went essentially to nothing like fairly quickly i mean not fairly immediately

00:47:50.740 --> 00:47:54.440
<v Tom>when we switch but like essentially for not to nothing so right.

00:47:54.440 --> 00:48:01.480
<v Matthias>And now let's come to the second question which is about avoiding breaking api

00:48:01.480 --> 00:48:04.040
<v Matthias>changes or breaking changes in the api.

00:48:04.040 --> 00:48:07.420
<v Tom>Yeah so i think you know

00:48:07.420 --> 00:48:10.140
<v Tom>i have we haven't released it yet we have a we i wrote a

00:48:10.140 --> 00:48:12.940
<v Tom>blog post exactly about this and what we do there and i'm

00:48:12.940 --> 00:48:16.520
<v Tom>happy to give a bit of a glimpse so first

00:48:16.520 --> 00:48:21.320
<v Tom>of all let's start with the interface and then we can talk about the internals

00:48:21.320 --> 00:48:25.340
<v Tom>the blog post is more about the interface anyway so when you when i mentioned

00:48:25.340 --> 00:48:30.560
<v Tom>we use this tool called aid and aid generates an open api spec open api by the

00:48:30.560 --> 00:48:34.760
<v Tom>way it's kind of like a formal definition language for describing http apis.

00:48:35.680 --> 00:48:43.720
<v Tom>So what aid would do it would take our call let's let's say the api calls create application,

00:48:44.400 --> 00:48:47.240
<v Tom>it will automatically you know make

00:48:47.240 --> 00:48:50.200
<v Tom>a json schema out of the ink you know the body that comes in

00:48:50.200 --> 00:48:53.140
<v Tom>the body that goes out all the path parameters

00:48:53.140 --> 00:48:56.280
<v Tom>query parameters the path itself authentication requirements all

00:48:56.280 --> 00:48:59.120
<v Tom>of that will just aid will automatically create that open

00:48:59.120 --> 00:49:01.920
<v Tom>api spec and we you know we did a lot

00:49:01.920 --> 00:49:04.800
<v Tom>of extra enrichment on our end to you know

00:49:04.800 --> 00:49:07.800
<v Tom>to add oh this this field

00:49:07.800 --> 00:49:11.000
<v Tom>also only accepts a regex expects a

00:49:11.000 --> 00:49:13.760
<v Tom>regex that like matches this and this integer can only be

00:49:13.760 --> 00:49:16.540
<v Tom>from 1 to 50 and this is an array that

00:49:16.540 --> 00:49:19.800
<v Tom>can't be empty like we added like a lot of additional kind of

00:49:19.800 --> 00:49:22.820
<v Tom>restrictions and the nice thing as well by the way again a small tangent is

00:49:22.820 --> 00:49:28.020
<v Tom>that the code that enforces it for us is the same as the code that generates

00:49:28.020 --> 00:49:34.080
<v Tom>the aid description so like the the you know the the size limitation is literally

00:49:34.080 --> 00:49:37.660
<v Tom>what we enforce if it's in the open api spec it's what we enforce if it's not

00:49:37.660 --> 00:49:42.520
<v Tom>it's not what we enforce and barring a few exceptions that we purposely broke that but like.

00:49:43.280 --> 00:49:48.080
<v Tom>Everything is just like, all of this is automatic. And what this means is that

00:49:48.080 --> 00:49:53.980
<v Tom>we have a JSON file with the exact description of our exact API that we can

00:49:53.980 --> 00:49:55.380
<v Tom>just generate whenever we want.

00:49:55.540 --> 00:49:58.160
<v Tom>So what we do, we generate that.

00:49:58.600 --> 00:50:01.380
<v Tom>We commit it whenever you make a change. We commit it to Git.

00:50:01.560 --> 00:50:04.080
<v Tom>And then in CI, we generate it again, and then we compare it.

00:50:04.640 --> 00:50:09.900
<v Tom>And if it's different, CI fails.

00:50:10.300 --> 00:50:13.180
<v Tom>So essentially what it means is that if you make a change that

00:50:13.180 --> 00:50:15.920
<v Tom>affects our api you consciously have to update that

00:50:15.920 --> 00:50:18.700
<v Tom>file like regenerate that files first of all you as the

00:50:18.700 --> 00:50:22.440
<v Tom>first the developer as the first line of defense you know that you

00:50:22.440 --> 00:50:25.060
<v Tom>just did something that's potentially breaking and again maybe it's on

00:50:25.060 --> 00:50:28.040
<v Tom>purpose maybe you added a new api it's fine but you

00:50:28.040 --> 00:50:31.060
<v Tom>know that and also we have github code

00:50:31.060 --> 00:50:34.180
<v Tom>owners on that specific spec file

00:50:34.180 --> 00:50:37.120
<v Tom>that forces people you know like that have extra

00:50:37.120 --> 00:50:39.880
<v Tom>people that you know that know what we expect from

00:50:39.880 --> 00:50:43.140
<v Tom>our api like what do we want to release what we don't want to release they

00:50:43.140 --> 00:50:45.800
<v Tom>will have to review it as well and just say you know

00:50:45.800 --> 00:50:48.920
<v Tom>what this api changes is reasonable so essentially

00:50:48.920 --> 00:50:53.500
<v Tom>we kind of like what we did is we leverage the you know the rust type system

00:50:53.500 --> 00:50:59.740
<v Tom>with some like macro magic and and a few other things to just to just make it

00:50:59.740 --> 00:51:04.300
<v Tom>so we can't accidentally change our api signatures we have to like we are have

00:51:04.300 --> 00:51:07.240
<v Tom>to be aware of it like we can't accidentally release an api we don't want to release.

00:51:07.360 --> 00:51:12.100
<v Tom>We can't just do those kind of things. So that's been great in that regard.

00:51:13.150 --> 00:51:22.030
<v Matthias>How would you even learn that? If you did not know that you could build an OpenAPI

00:51:22.030 --> 00:51:24.050
<v Matthias>spec from your XM handlers,

00:51:24.750 --> 00:51:30.550
<v Matthias>how would you find out that you could use the Rust type system to enforce that

00:51:30.550 --> 00:51:35.170
<v Matthias>and to make testing easier and to make breaking changes explicit?

00:51:35.330 --> 00:51:40.670
<v Matthias>Are there any tips that you could share in order to really hook into the Rust

00:51:40.670 --> 00:51:44.510
<v Matthias>type system and encode the invariance that you depend on.

00:51:44.990 --> 00:51:47.970
<v Tom>Yeah, so I think this is like a hack that we did. I don't know if,

00:51:48.210 --> 00:51:51.490
<v Tom>I don't think it's a common practice. Maybe it is, and we're just unaware of it.

00:51:51.810 --> 00:51:54.930
<v Tom>And that's why, again, we wrote that blog post. I mean, I think it is,

00:51:55.130 --> 00:51:57.210
<v Tom>I think more people should be doing exactly this.

00:51:58.210 --> 00:52:00.530
<v Tom>Although, you know what, now that I think about it, I'm sure like large companies

00:52:00.530 --> 00:52:01.610
<v Tom>have their own tricks as well.

00:52:02.450 --> 00:52:05.990
<v Tom>But really the key is, I'm a big fan. So some people, what they do,

00:52:06.130 --> 00:52:09.110
<v Tom>they write the open API spec by hand, and then

00:52:09.110 --> 00:52:13.210
<v Tom>they generate user generated to generate the backend code i don't think that's

00:52:13.210 --> 00:52:16.350
<v Tom>good because there's going to be a limit to how well that backend code could

00:52:16.350 --> 00:52:19.730
<v Tom>be generated so you're going to edit it manually in the end but i think when

00:52:19.730 --> 00:52:23.930
<v Tom>you generate it from the code and the code is the source of truth which you

00:52:23.930 --> 00:52:26.430
<v Tom>know let's face it is it is a source which that's what's run in production,

00:52:27.090 --> 00:52:31.170
<v Tom>i think generating from that like really makes sense so just i think just become

00:52:31.170 --> 00:52:37.490
<v Tom>obsessed with types and annotations and everything that you want to describe as part of your contract,

00:52:37.950 --> 00:52:40.970
<v Tom>i think if you're obsessed you know if you're just obsessed about that like

00:52:40.970 --> 00:52:45.850
<v Tom>everything else will just fall into place yeah i mean i think really that's

00:52:45.850 --> 00:52:50.050
<v Tom>that's the main thing and i guess related to that is you have to be careful

00:52:50.050 --> 00:52:54.050
<v Tom>about having too you know too wide of a contract,

00:52:54.850 --> 00:53:02.370
<v Tom>so if you by default let's say there's like a an age i mean this is like the most uh you know.

00:53:03.470 --> 00:53:08.310
<v Tom>CS101 example, but like you have like a person's name and age and you let the

00:53:08.310 --> 00:53:12.470
<v Tom>age be negative by default because, you know, you didn't limit it, you just put an int,

00:53:13.310 --> 00:53:18.470
<v Tom>then you can like, by default, you exposed a wider contract than what you want to expose.

00:53:18.750 --> 00:53:22.410
<v Tom>So I think actually being very explicit about annotating correctly,

00:53:22.650 --> 00:53:26.710
<v Tom>like everything that you actually believe this type should have and should hold,

00:53:26.870 --> 00:53:29.750
<v Tom>I think is extremely important. And I think that could go a long way.

00:53:30.910 --> 00:53:34.110
<v Matthias>And how far can you take this? I think in your blog post, you also mentioned

00:53:34.110 --> 00:53:37.770
<v Matthias>that your Redis get and set commands are strongly typed.

00:53:38.850 --> 00:53:41.990
<v Tom>Yeah. And so actually, this is a different blog post, that one.

00:53:42.130 --> 00:53:46.250
<v Tom>But this is the blog post you're referring to. That's where we use it for internal

00:53:46.250 --> 00:53:48.730
<v Tom>stability. And it's exactly that.

00:53:49.890 --> 00:53:54.410
<v Tom>Internally, we use the type system to protect us from a lot of things,

00:53:54.630 --> 00:53:57.870
<v Tom>like essentially define all the internal contracts as well.

00:53:58.490 --> 00:54:01.410
<v Tom>So you just mentioned Redis. you know redis lets you

00:54:01.410 --> 00:54:04.570
<v Tom>store random strings like it doesn't have any typing associated with

00:54:04.570 --> 00:54:07.290
<v Tom>it but what we did is we

00:54:07.290 --> 00:54:10.130
<v Tom>created a new type and we kind of we added a new redis

00:54:10.130 --> 00:54:12.930
<v Tom>interface and and and the

00:54:12.930 --> 00:54:16.170
<v Tom>way we do it is you have to define a type for the key and that

00:54:16.170 --> 00:54:19.190
<v Tom>type for the key has to be associated with a specific schema

00:54:19.190 --> 00:54:23.970
<v Tom>for the value so if you use that key to get a value you know that it's always

00:54:23.970 --> 00:54:27.830
<v Tom>going to be the correct schema because we parse at the moment we get it and

00:54:27.830 --> 00:54:31.370
<v Tom>when you write that value same thing you always know it's going to be the same

00:54:31.370 --> 00:54:36.490
<v Tom>schema because you the only way to write into redis would be using this interface

00:54:36.490 --> 00:54:38.930
<v Tom>does that make sense yeah.

00:54:38.930 --> 00:54:46.170
<v Matthias>It's a bit like a binary protocol that you talk to with redis so it's it's a

00:54:46.170 --> 00:54:50.590
<v Matthias>spec in between that is the contract between whatever redis expects and whatever

00:54:50.590 --> 00:54:52.070
<v Matthias>your code sends and receives.

00:54:52.070 --> 00:54:55.850
<v Tom>Exactly yeah we kind of like add this like nice contract between the you know

00:54:55.850 --> 00:54:59.810
<v Tom>like kind of what i said earlier the messy world out there like untyped redis

00:54:59.810 --> 00:55:02.330
<v Tom>and and our clean code you.

00:55:02.330 --> 00:55:07.250
<v Matthias>Recently moved to redis streams does that principle still apply do you also

00:55:07.250 --> 00:55:09.970
<v Matthias>use a typed stream of some sort.

00:55:09.970 --> 00:55:16.350
<v Tom>Yeah so the red redis streams they just accept you know payloads whatever you

00:55:16.350 --> 00:55:19.410
<v Tom>know you just put whatever you put there we just wrap around that as well with

00:55:19.410 --> 00:55:22.370
<v Tom>the same schema schema type it.

00:55:22.370 --> 00:55:25.890
<v Matthias>Feels like we both are really big fans of rust's

00:55:25.890 --> 00:55:28.990
<v Matthias>type safety but there are other traits

00:55:28.990 --> 00:55:32.530
<v Matthias>about rust that people often mention for

00:55:32.530 --> 00:55:35.730
<v Matthias>reasons for moving to rust and the

00:55:35.730 --> 00:55:38.730
<v Matthias>things that people usually mention is number one

00:55:38.730 --> 00:55:44.990
<v Matthias>safety slash security number two fearless concurrency number three performance

00:55:44.990 --> 00:55:52.790
<v Matthias>and number four stability or robustness and if you could order these four things

00:55:52.790 --> 00:55:57.670
<v Matthias>by priority what would be number one two and so on for you.

00:55:58.480 --> 00:56:02.100
<v Tom>Let me first give you a cop-out which is i think the rust typing system is what

00:56:02.100 --> 00:56:05.620
<v Tom>enables all of those so i just choose the rust typing system and that awesome

00:56:05.620 --> 00:56:08.120
<v Tom>yeah it just gives us all of that um.

00:56:08.660 --> 00:56:09.540
<v Matthias>Oh you like that.

00:56:09.540 --> 00:56:12.920
<v Tom>Yeah i think you know

00:56:12.920 --> 00:56:15.640
<v Tom>so i'm man i don't know i i think

00:56:15.640 --> 00:56:21.860
<v Tom>fearless concurrency i don't care about as much i mean i i do and we do use

00:56:21.860 --> 00:56:25.460
<v Tom>it and whatever so like don't yeah we definitely care about it but like i mean

00:56:25.460 --> 00:56:29.680
<v Tom>compared to like safety and correctness but again it they kind of come together

00:56:29.680 --> 00:56:33.780
<v Tom>right the fearless concurrency comes from the fact that you can be safe and correct,

00:56:34.640 --> 00:56:37.980
<v Tom>so i guess like all of my answers will just go back to the type system but let's say,

00:56:38.640 --> 00:56:44.400
<v Tom>safety correctness would be first performance would be second the fearless concurrency

00:56:44.400 --> 00:56:49.800
<v Tom>would be last what what's the third one okay so stability will be second performance

00:56:49.800 --> 00:56:55.320
<v Tom>will be third yeah i think that i think that's and i will add developer experience

00:56:55.320 --> 00:56:56.820
<v Tom>somewhere in the middle as well like,

00:56:58.000 --> 00:57:03.420
<v Tom>ergonomics yeah ergonomics exactly um it just yeah the the fact that you just

00:57:03.420 --> 00:57:06.180
<v Tom>write the code that you want to write and you kind of trust the compiler to

00:57:06.180 --> 00:57:13.200
<v Tom>do a good job but like the zero zero cost optimizations it's just it's fun yeah yeah it's.

00:57:13.200 --> 00:57:19.380
<v Matthias>Kind of interesting that you rank stability slash robustness higher than raw performance.

00:57:19.380 --> 00:57:22.140
<v Tom>I think you know like at the end of the day and

00:57:22.140 --> 00:57:25.360
<v Tom>i know a lot of people are going to be angry at me for saying this you can buy larger

00:57:25.360 --> 00:57:29.960
<v Tom>servers at the end i mean again i gave the python example you can't buy 100x

00:57:29.960 --> 00:57:34.160
<v Tom>larger server there is a limit but if you know it's about like 30 improvement

00:57:34.160 --> 00:57:39.920
<v Tom>you can buy a 30 larger server but like the the safety and like stability is

00:57:39.920 --> 00:57:43.040
<v Tom>just you can't buy a larger server for that yeah.

00:57:43.040 --> 00:57:50.940
<v Matthias>Okay we've witnessed your journey from the very beginning from the idea to now

00:57:50.940 --> 00:57:54.400
<v Matthias>if you started over with that project,

00:57:54.920 --> 00:58:01.120
<v Matthias>and with everything you know now what advice would you give to yourself back

00:58:01.120 --> 00:58:02.220
<v Matthias>in the day when you started.

00:58:02.220 --> 00:58:05.960
<v Tom>So i think

00:58:05.960 --> 00:58:09.540
<v Tom>we did a lot of things well but i think we did a lot well i

00:58:09.540 --> 00:58:12.900
<v Tom>guess a lot of things as well unwell not as well you know

00:58:12.900 --> 00:58:15.900
<v Tom>one thing that we did in the beginning and i

00:58:15.900 --> 00:58:18.880
<v Tom>think like i would bet that every startup founder does

00:58:18.880 --> 00:58:22.660
<v Tom>the same mistake is we assumed that today

00:58:22.660 --> 00:58:25.940
<v Tom>even though today we have zero customers by end

00:58:25.940 --> 00:58:29.640
<v Tom>of next week we're probably going to have facebook load or

00:58:29.640 --> 00:58:35.500
<v Tom>like google load on our systems and that's you know like we we just we will

00:58:35.500 --> 00:58:41.100
<v Tom>we will drown and die if we don't fix this load issue now immediately before

00:58:41.100 --> 00:58:45.100
<v Tom>anything else and we start designing our system to support that load and like

00:58:45.100 --> 00:58:47.880
<v Tom>yeah this can go from like zero to a billion customers,

00:58:48.860 --> 00:58:51.560
<v Tom>already and like this is the best system and all of that and and

00:58:51.560 --> 00:58:54.820
<v Tom>i think we made the same mistake of thinking that we

00:58:54.820 --> 00:58:58.700
<v Tom>should care about that early on and and

00:58:58.700 --> 00:59:03.540
<v Tom>we our initial system you know was built to support like a thousand x of the

00:59:03.540 --> 00:59:08.260
<v Tom>load that was you know where we had even in the you know close horizon so not

00:59:08.260 --> 00:59:11.760
<v Tom>just like one customer to 1 000 customers more kind of like you know realistically

00:59:11.760 --> 00:59:14.800
<v Tom>we're not going to get more than like 10 000 customers in the next year so like

00:59:14.800 --> 00:59:18.520
<v Tom>let's build to like 10 million whatever it just didn't make any sense but we still did it.

00:59:19.660 --> 00:59:22.960
<v Tom>And by doing that we actually built a much

00:59:22.960 --> 00:59:26.020
<v Tom>worse system because we were not serving

00:59:26.020 --> 00:59:28.840
<v Tom>those future customers we were serving the

00:59:28.840 --> 00:59:31.700
<v Tom>current customers and the current customers cared about like speed

00:59:31.700 --> 00:59:35.820
<v Tom>of iteration and and you know latency and and

00:59:35.820 --> 00:59:38.660
<v Tom>you know stability and all of that and you can't explain like it's not

00:59:38.660 --> 00:59:41.460
<v Tom>a good answer to tell the customers like i know this call is a bit slower

00:59:41.460 --> 00:59:44.400
<v Tom>than what you expect but it's because we need to support one

00:59:44.400 --> 00:59:47.460
<v Tom>billion people like you but it's like but you don't like yeah

00:59:47.460 --> 00:59:50.680
<v Tom>it's not a good answer so nowadays like

00:59:50.680 --> 00:59:53.620
<v Tom>what we follow is that we want to be able

00:59:53.620 --> 00:59:57.080
<v Tom>to scale 10x our current loads in

00:59:57.080 --> 00:59:59.980
<v Tom>a week like or whatever it is a very short period of time we want

00:59:59.980 --> 01:00:03.100
<v Tom>to see the path to being able to scale you know

01:00:03.100 --> 01:00:05.740
<v Tom>like one order of magnitude immediately but we

01:00:05.740 --> 01:00:09.160
<v Tom>don't care about like scaling 100x or 1000x that we

01:00:09.160 --> 01:00:12.000
<v Tom>assumed we're gonna like have at least like a month

01:00:12.000 --> 01:00:15.580
<v Tom>or two to figure out and then and that's

01:00:15.580 --> 01:00:18.000
<v Tom>what we do kind of like that i mean again by the way we don't it doesn't mean

01:00:18.000 --> 01:00:21.780
<v Tom>that we neuter our existing system we don't build it slow on purpose like we

01:00:21.780 --> 01:00:27.320
<v Tom>try to build it to the best of our ability but the the what we aim for is no

01:00:27.320 --> 01:00:33.160
<v Tom>longer that like a thousand x it's just that like 10x with a clear path to like 100x maybe,

01:00:34.200 --> 01:00:34.740
<v Tom>Does that make sense?

01:00:35.580 --> 01:00:44.160
<v Matthias>Yeah. So your advice to your past self would be focus on the now,

01:00:44.480 --> 01:00:48.900
<v Matthias>focus on the today rather than focus on what will be further down the road.

01:00:49.040 --> 01:00:52.540
<v Matthias>You cross the bridge when you get there and focus on building something that

01:00:52.540 --> 01:00:58.360
<v Matthias>is robust and can scale reasonably well instead of going for hypergrowth, by the way.

01:00:58.700 --> 01:01:05.240
<v Matthias>But what about the rust side? the advice that you would give when you started on this rust journey.

01:01:06.400 --> 01:01:10.000
<v Tom>I guess one thing is that it's

01:01:10.000 --> 01:01:15.100
<v Tom>okay to clone you know like one thing that i think we did in the beginning way

01:01:15.100 --> 01:01:20.840
<v Tom>too much was just to figure out how to have completely cloneless cloneless setup

01:01:20.840 --> 01:01:25.640
<v Tom>like all the way down the stack and the problem is is that at some point you

01:01:25.640 --> 01:01:29.100
<v Tom>would have a dependency that requires ownership,

01:01:29.980 --> 01:01:33.980
<v Tom>and you're either going to just normally, without all of this cloneless setup,

01:01:34.140 --> 01:01:35.920
<v Tom>you're just going to, would have just passed it the reference,

01:01:36.440 --> 01:01:37.560
<v Tom>the ownership, like all the way down.

01:01:37.700 --> 01:01:40.540
<v Tom>And now you kind of like have to clone. So essentially, you're just,

01:01:40.640 --> 01:01:44.760
<v Tom>it's a premature optimization trying to figure out like the best way to just not clone.

01:01:45.080 --> 01:01:48.240
<v Tom>And like the previous example, you don't have to be degenerate about it.

01:01:48.320 --> 01:01:51.000
<v Tom>Like you don't have to just like, oh, I'm just going to clone anywhere.

01:01:51.120 --> 01:01:52.800
<v Tom>Like no more passing as references.

01:01:53.000 --> 01:01:54.880
<v Tom>No, definitely pass as reference where it makes sense.

01:01:55.340 --> 01:02:00.120
<v Tom>But if you have to spend half an hour trying to figure out the signature for

01:02:00.120 --> 01:02:05.260
<v Tom>this function just to make it so you don't have to clone this fairly small structure,

01:02:05.280 --> 01:02:07.260
<v Tom>don't worry about it. I think that's...

01:02:08.560 --> 01:02:13.060
<v Tom>My mileage may vary depending on the exact situation but i think that that's

01:02:13.060 --> 01:02:15.440
<v Tom>one thing that we would change would.

01:02:15.440 --> 01:02:20.080
<v Matthias>I be wrong in summarizing it with keep it simple even if you write rust.

01:02:20.080 --> 01:02:23.380
<v Tom>100 and i think you know

01:02:23.380 --> 01:02:26.200
<v Tom>one of the we kind of talked about like one

01:02:26.200 --> 01:02:29.320
<v Tom>of the vices of the rust ecosystem is the obsession with

01:02:29.320 --> 01:02:32.180
<v Tom>correctness which again is great in most cases but

01:02:32.180 --> 01:02:35.020
<v Tom>you have to remember that sometimes the world outside is

01:02:35.020 --> 01:02:38.040
<v Tom>dirty and like you have you can't it's it's useless

01:02:38.040 --> 01:02:40.900
<v Tom>if you're correct if everything else everyone else is incorrect and you can't

01:02:40.900 --> 01:02:45.620
<v Tom>communicate with them i think the other one is like extreme obsession with performance

01:02:45.620 --> 01:02:48.840
<v Tom>i think it's great and healthy and we should continue that that's you know what

01:02:48.840 --> 01:02:55.220
<v Tom>makes rust so great or part of what makes rust great but we also need to be pragmatic so.

01:02:55.220 --> 01:03:02.240
<v Matthias>We have two tips already we have focus on the 10x not the 1000x and we have

01:03:02.240 --> 01:03:08.200
<v Matthias>focus on simplicity um cloning is fine for example anything else that you would want to add.

01:03:08.200 --> 01:03:12.520
<v Tom>Yeah so the the other two i mean the first one would be it's very similar to

01:03:12.520 --> 01:03:18.460
<v Tom>the 10x instead of a thousand x one that i gave earlier which is you know don't aim for 100 uptime,

01:03:19.200 --> 01:03:22.180
<v Tom>you know it's kind of like in the beginning where we were just obsessed as you

01:03:22.180 --> 01:03:25.240
<v Tom>said in the beginning as well like we can never go down so we were obsessed

01:03:25.240 --> 01:03:29.660
<v Tom>with just making sure always up the the thing is though the moment we stopped

01:03:29.660 --> 01:03:32.880
<v Tom>obsessing with 100 uptime and just gave ourselves an actually attainable goal,

01:03:33.000 --> 01:03:37.320
<v Tom>which was like five nines of uptime, we actually reached the 100%.

01:03:37.320 --> 01:03:39.620
<v Tom>Like we got to where we wanted to go before.

01:03:40.220 --> 01:03:42.900
<v Tom>And the reason for that is, again, similar to the previous point,

01:03:43.140 --> 01:03:49.120
<v Tom>is when you aim for something that's unattainable, you start doing things that

01:03:49.120 --> 01:03:53.280
<v Tom>are just crazy, that don't add any value, and actually make things more complex.

01:03:53.580 --> 01:03:56.440
<v Tom>So an example, I think it's a fair assumption.

01:03:57.510 --> 01:04:03.650
<v Tom>That we don't need to be up if an asteroid destroys the Earth,

01:04:03.730 --> 01:04:05.210
<v Tom>right? I think that's a fair assumption.

01:04:05.410 --> 01:04:12.010
<v Tom>But if we didn't make that assumption, we would maybe send a server to Mars

01:04:12.010 --> 01:04:16.590
<v Tom>or do something like insane like that, and then we'd have to deal with insane latency and all of that.

01:04:16.790 --> 01:04:20.490
<v Tom>All of a sudden, we added a lot of complexity for our system for really no good

01:04:20.490 --> 01:04:25.190
<v Tom>reason because if Earth is destroyed, I don't think anyone cares if we have a bit of downtime.

01:04:25.190 --> 01:04:28.070
<v Tom>So i think that that is

01:04:28.070 --> 01:04:30.890
<v Tom>one of those areas that i mean again we didn't

01:04:30.890 --> 01:04:33.930
<v Tom>have a server on mars just to be clear but like we were making

01:04:33.930 --> 01:04:36.970
<v Tom>crazy we just like crazy

01:04:36.970 --> 01:04:41.690
<v Tom>complex systems just to support that 0.00001

01:04:41.690 --> 01:04:44.430
<v Tom>percent of i mean just like it didn't make any sense it was

01:04:44.430 --> 01:04:47.250
<v Tom>actually making us less stable because the complexity made things

01:04:47.250 --> 01:04:50.530
<v Tom>worse and i guess the other the other

01:04:50.530 --> 01:04:54.790
<v Tom>like advice i would give my old self is

01:04:54.790 --> 01:04:58.370
<v Tom>really be diligent about avoiding unnecessary

01:04:58.370 --> 01:05:01.870
<v Tom>technical debt so i think well i guess so maybe even yeah

01:05:01.870 --> 01:05:05.050
<v Tom>because i think technical debt is fine technically it's great you take a loan

01:05:05.050 --> 01:05:09.110
<v Tom>on your own on future self in order to move faster now in order to build something

01:05:09.110 --> 01:05:13.990
<v Tom>now i don't think that's necessarily a bad thing a technical debt but there

01:05:13.990 --> 01:05:18.650
<v Tom>are kinds of technical debt that are just unnecessary like naming things poorly

01:05:18.650 --> 01:05:19.830
<v Tom>in the database for example.

01:05:20.590 --> 01:05:24.010
<v Tom>If you have a poorly named table and

01:05:24.010 --> 01:05:26.950
<v Tom>you know that and you're just lazy about changing that in staging or

01:05:26.950 --> 01:05:29.630
<v Tom>in local development essentially what you did

01:05:29.630 --> 01:05:32.550
<v Tom>is that everyone from now on till the future

01:05:32.550 --> 01:05:35.910
<v Tom>of the company you know till the end of the company until 20 years

01:05:35.910 --> 01:05:39.030
<v Tom>in the future will have to deal with

01:05:39.030 --> 01:05:42.090
<v Tom>that poor naming because like naming a date renaming a database table

01:05:42.090 --> 01:05:44.750
<v Tom>is a pain renaming a database column is a pain no one

01:05:44.750 --> 01:05:47.550
<v Tom>is going to bother with doing that it's not the same is

01:05:47.550 --> 01:05:51.910
<v Tom>like just like having poor names in the code which again unnecessary but it's

01:05:51.910 --> 01:05:56.490
<v Tom>fine we can fix it later here it's actually like just an unnecessary you know

01:05:56.490 --> 01:06:00.330
<v Tom>avoidable piece of debt that i would really encourage people to be extremely

01:06:00.330 --> 01:06:03.110
<v Tom>diligent about don't let people hand away if it's like hey it's not a big deal

01:06:03.110 --> 01:06:05.350
<v Tom>just a name no it does matter does.

01:06:05.350 --> 01:06:09.490
<v Matthias>It help to be explicit about the things that you store in a table or how would

01:06:09.490 --> 01:06:13.090
<v Matthias>you find a good name i know it's a bit of a segue but i'm curious now.

01:06:13.090 --> 01:06:15.890
<v Tom>I think it's really depends on the case but

01:06:15.890 --> 01:06:18.870
<v Tom>just if if someone or more

01:06:18.870 --> 01:06:22.710
<v Tom>than one person on the team find the name confusing i think that's warrant to

01:06:22.710 --> 01:06:26.150
<v Tom>change because people finding something confusing means they're going to go

01:06:26.150 --> 01:06:29.290
<v Tom>down a rabbit hole in a few months from now when they've debug you know when

01:06:29.290 --> 01:06:33.550
<v Tom>they're trying to figure out yeah i mean actually okay one quick example that

01:06:33.550 --> 01:06:36.730
<v Tom>we have we have a few things that are named,

01:06:37.530 --> 01:06:41.350
<v Tom>like restricted to mean that they

01:06:41.350 --> 01:06:44.250
<v Tom>actually have more access because they kind

01:06:44.250 --> 01:06:47.330
<v Tom>of like they have restricted access well in and

01:06:47.330 --> 01:06:50.910
<v Tom>it's so confusing and i get it wrong because we also have limited and

01:06:50.910 --> 01:06:53.790
<v Tom>restricted and another name and sometimes we

01:06:53.790 --> 01:06:57.670
<v Tom>take the the position of the elevated access sometimes we take the position

01:06:57.670 --> 01:07:03.250
<v Tom>of the data and it's it's really confusing that's that costs us like many many

01:07:03.250 --> 01:07:07.010
<v Tom>developer hours over the last of the years and again it's it's not a big deal

01:07:07.010 --> 01:07:11.110
<v Tom>but it is something that's like too big of a pain to change and i wish we were

01:07:11.110 --> 01:07:13.550
<v Tom>just a bit more thoughtful about it in the beginning would.

01:07:13.550 --> 01:07:15.810
<v Matthias>Privileged access have been a better choice.

01:07:15.810 --> 01:07:20.170
<v Tom>Yeah so privilege would have been better or just deciding are we talking about

01:07:20.170 --> 01:07:24.270
<v Tom>the data or are we talking about the access just being consistent about that

01:07:24.270 --> 01:07:27.550
<v Tom>and always doing that i would have gone a long way and.

01:07:27.550 --> 01:07:32.910
<v Matthias>Finally it's become a bit of a tradition around here to ask that one last question

01:07:32.910 --> 01:07:37.830
<v Matthias>do you have a message to the rust community anything that comes to mind.

01:07:37.830 --> 01:07:40.670
<v Tom>So first of all message of love i'm a big fan

01:07:40.670 --> 01:07:43.370
<v Tom>keep up the good work but in terms of a

01:07:43.370 --> 01:07:46.750
<v Tom>message that drives action i think

01:07:46.750 --> 01:07:49.470
<v Tom>we got to fix the compilation time i don't know what we can do

01:07:49.470 --> 01:07:52.290
<v Tom>it's really i think

01:07:52.290 --> 01:07:57.750
<v Tom>it's it is like a bottleneck for a lot of people it is it is a pain and it's

01:07:57.750 --> 01:08:02.250
<v Tom>it even affects you know like id like you know language servers and all of that

01:08:02.250 --> 01:08:06.990
<v Tom>like when it's when things are slow and so we got to fix that thing yeah and

01:08:06.990 --> 01:08:09.710
<v Tom>i know people are working on it so yeah.

01:08:09.710 --> 01:08:13.130
<v Matthias>That's that's true let's all wish for faster compile

01:08:13.130 --> 01:08:18.230
<v Matthias>times this year and maybe the next couple years it has improved but i i shared

01:08:18.230 --> 01:08:23.950
<v Matthias>a sentiment with you so that wraps it up thanks so much for all the insights

01:08:23.950 --> 01:08:29.850
<v Matthias>and i can tell that If I had production-level webhooks that I wanted to serve anywhere,

01:08:30.090 --> 01:08:32.510
<v Matthias>then I would take a very, very close look at Svix.

01:08:32.890 --> 01:08:37.350
<v Matthias>Good job there, and good luck with the future, and thanks for being a guest.

01:08:37.930 --> 01:08:39.050
<v Tom>Thank you. Thank you for having me.

01:08:39.470 --> 01:08:43.210
<v Matthias>Rust in Production is a podcast by corrode. It is hosted by me,

01:08:43.510 --> 01:08:46.270
<v Matthias>Matthias Endler, and produced by Simon Brüggen.

01:08:46.430 --> 01:08:50.730
<v Matthias>For show notes, transcripts, and to learn more about how we can help your company

01:08:50.730 --> 01:08:56.170
<v Matthias>make the most of Rust, visit corrode.dev. Thanks for listening to Rust in Production.