WEBVTT 00:00:01.550 --> 00:00:05.830 It's Rust in Production, a podcast about companies who use Rust to shape the 00:00:05.830 --> 00:00:06.710 future of infrastructure. 00:00:07.190 --> 00:00:11.390 My name is Matthias Endler from corrode and today we talk to Kevin Guthrie and 00:00:11.390 --> 00:00:16.830 Edward Wang from Cloudflare about handling 90 million web requests per second with Rust. 00:00:19.510 --> 00:00:24.030 Kevin and Edward, thanks so much for taking the time. Can you introduce yourselves 00:00:24.030 --> 00:00:25.930 and Cloudflare, the company you work for? 00:00:26.550 --> 00:00:29.450 Sure. I'll go first. My name is Kevin Guthrie. 00:00:29.850 --> 00:00:33.310 I'm Principal Software Engineer or Systems Engineer at Cloudflare. 00:00:33.510 --> 00:00:37.250 I've been here about a year and a half. I've been a Rust developer for about 00:00:37.250 --> 00:00:40.890 four-ish years, on and off professionally. 00:00:41.310 --> 00:00:45.410 I've done some side projects, some games, some really stupid projects, 00:00:45.550 --> 00:00:46.570 some really complex projects. 00:00:46.670 --> 00:00:53.010 I just love Rust language. and it's hard to do anything else what about you Edward. 00:00:53.010 --> 00:00:56.150 Yeah hey my name is Edward 00:00:56.150 --> 00:00:58.890 i'm also a systems engineer here at 00:00:58.890 --> 00:01:02.370 cloudflare and i've been working 00:01:02.370 --> 00:01:05.310 on rust i mean since i joined the 00:01:05.310 --> 00:01:11.990 company essentially about almost five years ago now at this point and i've been 00:01:11.990 --> 00:01:16.390 working on well we're going to talk about the Pingora framework today previously 00:01:16.390 --> 00:01:21.630 i was working at a game studio so working on internet plumbing essentially was 00:01:21.630 --> 00:01:25.030 a pretty big difference now. 00:01:25.030 --> 00:01:31.310 Yes we will talk about Pingora today but i'm not sure if people are aware of 00:01:31.310 --> 00:01:37.110 the scale that cloudflare is at can you share some numbers just to fill everyone in. 00:01:38.290 --> 00:01:42.710 Yeah, okay. So we have some changing data. 00:01:43.130 --> 00:01:46.210 This is just based on public data. All the things we're going to share today 00:01:46.210 --> 00:01:47.910 are things that are publicly available. 00:01:48.830 --> 00:01:52.590 We have about 20% of the internet goes through Cloudflare. 00:01:54.730 --> 00:01:59.270 The reference here is from a tweet from one of our engineers. 00:01:59.770 --> 00:02:02.410 This is up from a couple years ago when you had, 00:02:03.760 --> 00:02:07.240 Steve Klabnik was on and talked about how Cloudflare had 10% of the world's 00:02:07.240 --> 00:02:10.560 internet, or 10% of the internet. So we're a little bit up from that. 00:02:11.240 --> 00:02:16.880 Currently, from the internal Pingora side, we handle about 90 million requests 00:02:16.880 --> 00:02:21.100 per second worldwide, occasionally going up above 100 million requests per second. 00:02:21.660 --> 00:02:27.340 That is crazy. I guess beyond comprehension for most people. 00:02:27.680 --> 00:02:33.300 That would mean that probably a huge majority of the traffic goes through Cloudflare, 00:02:33.300 --> 00:02:36.500 and maybe to some extent through Rust. We will talk about that today. 00:02:36.780 --> 00:02:44.480 But what is the setup internally to handle that scale, to handle that amount of requests? 00:02:45.560 --> 00:02:50.860 Yeah, I think we have a, 00:02:51.300 --> 00:02:54.720 I mean, if you're not familiar with Cloudflare to begin with, 00:02:55.000 --> 00:03:03.100 Cloudflare operates a global network rate of more than 300 points of presence around the globe. 00:03:03.100 --> 00:03:08.440 There are all these points of presence data centers in many different countries, 00:03:08.440 --> 00:03:15.260 and traffic is routed to them via these Anycast addresses, something that we've 00:03:15.260 --> 00:03:17.200 talked about on the blog before. 00:03:17.480 --> 00:03:23.560 So there are all sorts of setups, both on the layer 4 side and layer 7 side, 00:03:23.720 --> 00:03:28.940 to be able to load balance, distribute traffic and capacity accordingly. 00:03:28.940 --> 00:03:33.860 Internally, we operate one of the, 00:03:34.100 --> 00:03:41.280 and by we, I mean our team operates one of the services that your request travels 00:03:41.280 --> 00:03:46.440 through in order to get served in a response. 00:03:46.840 --> 00:03:54.320 And those are the services that are using our Angular framework, which is our team. 00:03:54.320 --> 00:04:00.960 But internally, yes, there's a bunch of different mechanisms to balance and 00:04:00.960 --> 00:04:04.860 distribute the traffic outside of data centers. 00:04:05.400 --> 00:04:09.740 External to the data centers, routing to the data centers, and then within our 00:04:09.740 --> 00:04:16.440 data centers themselves throughout the life of a request, as we call it. 00:04:17.080 --> 00:04:23.380 Traveling through a few different CDN services or proxies. 00:04:24.870 --> 00:04:26.010 Some rust some not. 00:04:26.010 --> 00:04:29.730 A lot of people who are in the rust community for 00:04:29.730 --> 00:04:32.870 quite a while know that cloudflare was one 00:04:32.870 --> 00:04:35.950 of the earliest adopters of frost at a 00:04:35.950 --> 00:04:43.730 larger scale but what was rarely talked about was the reasoning behind why cloudflare 00:04:43.730 --> 00:04:49.690 chose rust can you shed some light on that was it for performance reasons was 00:04:49.690 --> 00:04:55.470 it for memory safety reasons what was the big driver behind Rust adoption at Cloudflare? 00:04:56.430 --> 00:05:02.650 I think neither Kevin and I were around for the very beginning of this, right? 00:05:02.830 --> 00:05:10.250 But we know that our teams have the content delivery network, 00:05:10.870 --> 00:05:14.090 you know, compute, workers compute teams, etc. 00:05:14.470 --> 00:05:18.770 They've all been eyeing Rust for a long time, as you mentioned. 00:05:19.650 --> 00:05:26.070 And I think it was really, yeah, All of the, like, compile time... 00:05:27.010 --> 00:05:32.430 Checks that you'd be able to do, all of the classes of bugs that you would essentially 00:05:32.430 --> 00:05:37.510 eliminate from production, right? 00:05:38.310 --> 00:05:42.210 You would be, I mean, on the content delivery side, 00:05:42.910 --> 00:05:49.670 it's no secret that a lot of our company was built on these proxy services using 00:05:49.670 --> 00:05:57.070 NGINX, right, which is based in C code, as well as Lua business logic on top of that. 00:05:58.370 --> 00:06:07.690 The amount of, let's just say that there were certainly a number of core dumps 00:06:07.690 --> 00:06:17.330 and invalid memory accesses associated with us perhaps making changes within our NGINX fork. 00:06:17.330 --> 00:06:23.830 Over time, we've had to implement more and more complicated features within NGINX internally. 00:06:25.150 --> 00:06:29.710 And these core dumps are really impactful, obviously. 00:06:30.110 --> 00:06:35.330 When a core dump happens on a worker process in NGINX, that drops like thousands of requests. 00:06:36.230 --> 00:06:44.350 So we had executive, honestly, we had executive visibility and support on that. 00:06:44.350 --> 00:06:52.110 But something that our team has talked about before was that at least in the 00:06:52.110 --> 00:06:57.530 earlier years, prior to Rust adoption, 00:06:58.690 --> 00:07:05.190 for each core dump, I recall that our former CTO, John Graham Cumming, 00:07:05.270 --> 00:07:07.570 would actually get an email for each of those. 00:07:08.210 --> 00:07:13.590 These crashes were very much top of mind for folks. 00:07:14.850 --> 00:07:23.170 So when you have that kind of, if you're able to build something to leverage 00:07:23.170 --> 00:07:24.810 all of those advantages of, 00:07:25.130 --> 00:07:29.990 hey, I can just completely erase, eliminate these classes of errors, 00:07:30.170 --> 00:07:33.830 then you're definitely going to be pursuing something like that. 00:07:33.950 --> 00:07:41.710 And Cloudflare is certainly we are not shy to consider every new technology 00:07:41.710 --> 00:07:43.650 an advantage we can come by. 00:07:43.650 --> 00:07:49.050 The one thing that I noticed when you explained the reasoning behind Rust was 00:07:49.050 --> 00:07:53.110 that a big chunk of the business logic was written in Lua. 00:07:53.310 --> 00:07:58.790 And I wondered immediately, couldn't you just use another language with a static type system? 00:07:59.030 --> 00:08:03.650 Like, I don't know, Go, for example, wouldn't that have been easier to integrate? 00:08:04.810 --> 00:08:13.470 Sure, but we were using NGINX, which was relatively new at the time that we were adopting. 00:08:13.650 --> 00:08:21.110 It, I believe, we had built a lot of our features, the firewall, 00:08:21.470 --> 00:08:28.310 etc., and DDoS features on top of filters that would run in NGINX. 00:08:29.190 --> 00:08:41.490 So the OpenResty, I believe, as it's called, the OpenResty is a framework set of. 00:08:42.710 --> 00:08:48.670 It allows you to implement business logic that you can plug into each of the 00:08:48.670 --> 00:08:56.230 NGINX filters that run across the life of a request without necessarily touching all of the very, 00:08:56.230 --> 00:08:59.650 perhaps arcane to a lot of folks, C code. 00:09:00.050 --> 00:09:07.510 So in order to do something like integrate Go and stuff, there might be certain 00:09:07.510 --> 00:09:09.810 similar efforts to do that. 00:09:09.810 --> 00:09:17.230 But I think none have been as mature as OpenResty and its Lua logic and Lua filters. 00:09:17.670 --> 00:09:24.530 I think for a while, some of the OpenResty folks were working with us on our CDN teams. 00:09:25.270 --> 00:09:28.310 So generally speaking i 00:09:28.310 --> 00:09:31.930 think go was one of the possibilities i 00:09:31.930 --> 00:09:39.930 believe when we were evaluating other languages right to go to to switch to 00:09:39.930 --> 00:09:44.210 but i think rust is the definite forerunner for all the reasons that i mentioned 00:09:44.210 --> 00:09:49.810 a you know zero cost abstractions for great performance and And obviously, 00:09:50.050 --> 00:09:51.870 most importantly, I think, 00:09:52.250 --> 00:09:58.370 eliminating all sorts of memory safety issues and bugs that can arise from memory safety issues. 00:09:59.070 --> 00:10:02.990 One other thing that impacted our decision to go to Rust, I think. 00:10:03.170 --> 00:10:05.670 Like I said, I've only been here a year and a half, so I was definitely not around when the 00:10:05.670 --> 00:10:06.550 decision was being made. 00:10:06.750 --> 00:10:11.510 But we had a lot of the forerunners, like the celebrities and the Rust community, 00:10:11.710 --> 00:10:15.110 were working at Cloudflare at various points in time. 00:10:15.250 --> 00:10:19.170 Like Steve Klabnik himself worked at Cloudflare. Ashley Williams also worked here. 00:10:20.150 --> 00:10:25.850 So, I mean, there was a lot of popularity of the Rust language in Cloudflare to begin with. 00:10:26.270 --> 00:10:32.070 When you want to integrate a language like Go into an existing infrastructure 00:10:32.070 --> 00:10:38.530 that runs on NGINX, that would probably be a little harder because Go has a 00:10:38.530 --> 00:10:40.910 small but not negligible runtime. 00:10:40.910 --> 00:10:45.630 It has a garbage collector and so on. Whereas with Rust, you could integrate 00:10:45.630 --> 00:10:49.610 very deeply with basic C FFI. 00:10:49.890 --> 00:10:55.010 Did that also play a big role? And also, did you end up integrating Rust into 00:10:55.010 --> 00:11:00.550 your NGINX server for a while before you moved on to build your own solution? 00:11:01.450 --> 00:11:08.810 Yeah, I think this speaks to the matters of how do you migrate and switch over 00:11:08.810 --> 00:11:13.790 pieces of your infrastructure gradually to Rust as well, right? 00:11:16.310 --> 00:11:23.610 So all of the, as I mentioned, a lot of the core business logic historically 00:11:23.610 --> 00:11:27.850 has been built on Lua via OpenResty. 00:11:28.170 --> 00:11:31.910 All of that, you know, business logic built up over time. 00:11:32.630 --> 00:11:38.410 So I think there were initially some notions of how do you integrate, 00:11:38.410 --> 00:11:40.590 how do you maybe integrate, 00:11:40.690 --> 00:11:46.110 change those filters to how do you extract the business logic, right? 00:11:46.370 --> 00:11:50.930 To use, to be using your Rust-based logic instead. 00:11:51.450 --> 00:11:56.870 And there, I think we're varying approaches to this for, there are a lot of 00:11:56.870 --> 00:11:59.630 teams that work on the CDN in addition to us. 00:12:00.130 --> 00:12:05.570 Some of these, some of the logic you can kind of extract into different services, 00:12:05.570 --> 00:12:10.970 either in-band with the request processing or out-of-band. 00:12:10.970 --> 00:12:14.710 You can make calls to other services. 00:12:14.710 --> 00:12:26.370 For example, the approach that we ended up choosing was to extract a specific, on a high level, 00:12:26.570 --> 00:12:33.550 extract a particular responsibility of one of our NGINX proxies into a separate service. 00:12:33.550 --> 00:12:38.310 That was what we were doing when we were first developing Pingora, 00:12:38.330 --> 00:12:46.670 which was at the time, NGINX would reach out to make origin connections directly 00:12:46.670 --> 00:12:48.370 and make origin requests directly. 00:12:48.890 --> 00:12:55.890 We decided, hey, what if we situated an in-band with a request proxy that sits 00:12:55.890 --> 00:13:01.230 just behind that NGINX proxy? and routed requests to that instead. 00:13:01.510 --> 00:13:07.310 And then that service would decide how to make origin requests to which origins. 00:13:07.950 --> 00:13:11.510 Handling all of the origin communication responsibility. 00:13:12.470 --> 00:13:17.810 And then you're able to do something like divert traffic to that selectively 00:13:17.810 --> 00:13:23.910 depending on how ready that service is to handle certain classes of requests. 00:13:23.910 --> 00:13:29.230 I think this is generally the strategy that has been working out pretty well 00:13:29.230 --> 00:13:32.810 for various services at Cloudflare, 00:13:32.830 --> 00:13:39.610 as long as you're able to have something that has some sort of control plane that sits in front and, 00:13:40.140 --> 00:13:44.780 decides, you know, the routing in band of that request. 00:13:46.480 --> 00:13:52.040 It's the unsurprising answer of how do you solve a problem at a proxy company 00:13:52.040 --> 00:13:53.340 is by adding more proxies. 00:13:55.260 --> 00:14:00.040 Yeah. And certainly at first, I think we were a little concerned about, 00:14:00.040 --> 00:14:05.840 I think whenever you're adding another hop, it's another proxy hop at injecting 00:14:05.840 --> 00:14:06.820 another service into it. 00:14:06.920 --> 00:14:10.920 You're worried about complexity. You're worried about performance regressions 00:14:10.920 --> 00:14:13.660 and things like that, latency, obviously, right? 00:14:14.340 --> 00:14:24.340 Generally speaking, what we had noticed was that adding another service hop is 00:14:24.340 --> 00:14:28.040 certainly unconditionally going to add some amount of latency. 00:14:29.300 --> 00:14:32.560 Thankfully the feature the 00:14:32.560 --> 00:14:35.580 new logic that we were adding on top 00:14:35.580 --> 00:14:41.620 of that was generally able to offset a lot of any those detriments that you 00:14:41.620 --> 00:14:46.640 would face the the example that we tended to point out in our in our blog a 00:14:46.640 --> 00:14:53.380 while ago was how our Pingora service i don't know how much we want to get into, 00:14:54.320 --> 00:14:57.060 the why exactly in terms 00:14:57.060 --> 00:15:00.100 of like NGINX versus Pingora architecture and stuff 00:15:00.100 --> 00:15:03.360 but Pingora was a lot more 00:15:03.360 --> 00:15:06.300 efficient this was definitely top of mind for us 00:15:06.300 --> 00:15:09.280 a lot more efficient in terms of how it was making and 00:15:09.280 --> 00:15:12.360 reusing origin connections and so 00:15:12.360 --> 00:15:16.520 something like that generally brings down your you 00:15:16.520 --> 00:15:19.560 know the the the latency of making an origin 00:15:19.560 --> 00:15:26.300 request significantly if you can skip all of the TLS handshake etc latency fortunately 00:15:26.300 --> 00:15:31.780 for us as well when it comes to replacing an origin facing proxy the the cost 00:15:31.780 --> 00:15:38.440 of the cost of origin latency significantly dwarfs any additional proxy hop latency you have so. 00:15:39.020 --> 00:15:42.140 Okay, but was the project already 00:15:42.140 --> 00:15:47.060 called Pingora back then, or was it some sort of intermediate step? 00:15:47.480 --> 00:15:53.840 I guess I have to shout out some folks who were working on Pingora. 00:15:54.000 --> 00:15:59.580 I say I was working on it, but in reality, it was Yuchen, Yuchen Wu, 00:15:59.820 --> 00:16:06.980 as well as Andrew Houck, who were kind of the primary and first drivers of Pingora. 00:16:06.980 --> 00:16:09.920 And at first it was called OpenRusty, I think. 00:16:10.180 --> 00:16:14.920 You still see this term in some of the old tests because it was very much meant 00:16:14.920 --> 00:16:23.600 to replace OpenResty and NGINX itself and be a, I don't want to say a drop-in replacement, 00:16:23.840 --> 00:16:29.260 but do all the things and model a lot of its logic off of NGINX and OpenResty. 00:16:29.440 --> 00:16:37.340 Because honestly, that worked for us. and NGINX's logic models and the way that 00:16:37.340 --> 00:16:39.660 it thought about request processing worked for us. 00:16:39.780 --> 00:16:44.740 So we wanted to do a lot of things pretty similarly to NGINX. 00:16:45.310 --> 00:16:48.930 I really like the name, OpenRusty, but of course. 00:16:49.850 --> 00:16:53.790 I don't remember why they didn't go with that name in the end. 00:16:53.990 --> 00:17:01.370 I think it partially was sort of pejorative, but also could have been confused as a typo. 00:17:01.730 --> 00:17:05.050 I think Pingora is a much better name. The name, I think, came from the manager 00:17:05.050 --> 00:17:07.510 of the team who almost slipped and died off of the mountain, 00:17:07.810 --> 00:17:09.550 the literal mountain that's actually called Pingora. 00:17:10.230 --> 00:17:18.650 I believe the story is that a particular trip to the Pingora Mountain almost cost him his life. 00:17:18.670 --> 00:17:23.630 And now we've been ascending that summit ever since. 00:17:26.250 --> 00:17:30.790 Sounds like negative foreshadowing, but in reality it worked out well. 00:17:33.170 --> 00:17:37.490 But the one thing that I wondered was that, let's say you add another hop. 00:17:37.610 --> 00:17:39.210 You add proxy behind a proxy. 00:17:39.890 --> 00:17:45.130 And then you have a ton of requests coming in and then you want to switch to 00:17:45.130 --> 00:17:49.610 a new version you just kind of want to do a release basically wouldn't that 00:17:49.610 --> 00:17:54.750 be an easy source for dropping connections and dropping requests it. 00:17:54.750 --> 00:17:58.690 Is so it's something we have to do very carefully and we have since we handle 00:17:58.690 --> 00:18:03.770 things like web sockets we have lots of long-running requests so upgrades updates 00:18:03.770 --> 00:18:08.890 are something we don't do very often now or then. 00:18:09.170 --> 00:18:13.270 The way Pingora does this is a really slick system where when you bring up a 00:18:13.270 --> 00:18:18.750 new update, the process that you want to move everything to, it can start. 00:18:19.210 --> 00:18:25.210 It can know about the old instance of Pingora that's currently running. 00:18:25.490 --> 00:18:30.230 And that old instance can gracefully hand over the socket to start listening 00:18:30.230 --> 00:18:34.850 for new connections on the new instance of Pingora while old requests finish 00:18:34.850 --> 00:18:39.250 out on the old one and then the old instance can handle all of its requests 00:18:39.250 --> 00:18:40.790 and then gracefully shut down 00:18:40.790 --> 00:18:43.830 whereas the new one is bringing up any new connections and handling those. 00:18:44.650 --> 00:18:50.090 Is that safe or does that happen between processes? I wonder if you can even make it safe. 00:18:51.290 --> 00:18:56.830 It is. I mean, I don't know. I'm sure in Rust this is classified as some form 00:18:56.830 --> 00:19:01.230 of unsafe code because you're passing around raw file descriptors for sockets. 00:19:01.830 --> 00:19:06.150 But it is also a really common thing. This is something I first heard of at Facebook, 00:19:06.910 --> 00:19:13.750 where their networking, their HTTP servers do the same exact thing or even their 00:19:13.750 --> 00:19:18.430 load balancing system i think it's a very common process but i never worked 00:19:18.430 --> 00:19:21.190 with the actual code to do it until working on the Pingora project. 00:19:21.190 --> 00:19:27.910 Yeah there's actually so yeah there's there's this process of transferring these 00:19:27.910 --> 00:19:30.710 listen file descriptors i think it's one of the few places, 00:19:31.590 --> 00:19:35.450 i i could be wrong but i think it's one of the few places where yes because 00:19:35.450 --> 00:19:38.810 we're dealing with those raw file descriptors, there's a bit of unsafe code there. 00:19:39.170 --> 00:19:49.010 There's actually also a crate that I believe we've put out, not us ourselves, but I mentioned, 00:19:49.390 --> 00:19:54.550 or maybe I haven't mentioned yet that Cloudflare is not in a monolith when it 00:19:54.550 --> 00:19:57.750 comes to Rust, and we are not the only folks, 00:19:59.030 --> 00:20:01.150 developing in the Rust ecosystem. 00:20:01.350 --> 00:20:04.850 So the folks who are working on another proxy 00:20:04.850 --> 00:20:08.570 framework called oxy have actually open sourced a 00:20:08.570 --> 00:20:13.410 crate that is specifically for these kinds of graceful process restarts and 00:20:13.410 --> 00:20:19.830 it's called shellflip so that it uses a very you know similar mechanism of you 00:20:19.830 --> 00:20:28.350 know transferring while descriptors and doing that handover of of yes doing that handover. 00:20:29.280 --> 00:20:35.500 ShellFlip sounds like another really cool name. You have a way with names, I guess, at Cloudflare. 00:20:36.000 --> 00:20:39.820 I think it was taken from the TableFlipGo package. 00:20:40.080 --> 00:20:44.840 And I think some of our engineers decided to... 00:20:44.840 --> 00:20:45.560 That's a good name. 00:20:45.640 --> 00:20:49.040 I don't know why it's Shell in particular, but maybe it has to do with crabs. 00:20:50.020 --> 00:20:56.140 Maybe, yeah. But do you share a lot of code with other teams at Cloudflare? 00:20:57.100 --> 00:21:01.920 Rust creates, that is. You mentioned that you have Pingora and you have Oxy, 00:21:02.080 --> 00:21:05.240 but there's probably more stuff at Cloudflare which uses Rust. 00:21:05.500 --> 00:21:06.940 How does code sharing look like? 00:21:07.420 --> 00:21:13.400 Because from another company, or actually from a few, I heard that sort of by 00:21:13.400 --> 00:21:18.280 serendipity, they start to use different crates in completely different contexts. 00:21:18.280 --> 00:21:21.160 And it kind of happens very naturally to share code. 00:21:22.130 --> 00:21:25.270 Yeah, that's true. We have a sort of a haphazard way of sharing code. 00:21:25.430 --> 00:21:29.970 We do have our own internal repository for uploading crates, 00:21:30.010 --> 00:21:32.010 like it's an internal copy of crates.io. 00:21:33.310 --> 00:21:34.230 Internal registry. 00:21:34.590 --> 00:21:38.590 Sorry, internal registry. That's the right terminology. But for a lot of it, 00:21:38.750 --> 00:21:42.890 it's done through referencing crates through Git URLs. 00:21:43.170 --> 00:21:47.370 So it's a little bit on the Go side of things. So you have a crate that you 00:21:47.370 --> 00:21:49.350 want to share with other people in Cloudflare. 00:21:50.090 --> 00:21:54.310 It's up on our internal Git server. You can write a blog post about it. 00:21:54.390 --> 00:21:55.790 Anybody can just include it. 00:21:56.470 --> 00:22:00.490 Putting it on the internal registry should be a more common thing to do, 00:22:00.510 --> 00:22:02.670 but I have literally never done it. 00:22:02.670 --> 00:22:06.790 I've shared a couple of crates with different teams for various stupid things, 00:22:06.950 --> 00:22:11.950 but most of those are just incorporated either through one way, 00:22:12.050 --> 00:22:15.170 which is making the project open source and then having people consume it just 00:22:15.170 --> 00:22:19.750 from the open internet or from the actual crates.io or consuming it internally 00:22:19.750 --> 00:22:21.030 from an internal Git repo. 00:22:21.310 --> 00:22:27.490 I will say that I think usage of the internal registries is pretty common now. 00:22:27.690 --> 00:22:36.410 The whole point of the registry, One of the major points was to avoid that the 00:22:36.410 --> 00:22:42.850 kit commits kind of references that Argo allows you to do. 00:22:43.950 --> 00:22:47.070 It's still used in some cases, right? 00:22:47.590 --> 00:22:55.050 But yeah, more and more, I think the ecosystem around Rust has become a lot 00:22:55.050 --> 00:23:01.030 more shared maybe in the past few years. In our code, we use both approaches. 00:23:02.370 --> 00:23:06.790 And when you publish code, do you have a formal process for the publication? 00:23:07.030 --> 00:23:13.070 Do you run any cargo tools to make sure that the code quality is on par with the rest? 00:23:14.440 --> 00:23:17.680 Uh we use the i mean really we 00:23:17.680 --> 00:23:20.440 use the standard open source tools we use 00:23:20.440 --> 00:23:23.480 clippy we use the auditing tool 00:23:23.480 --> 00:23:26.640 name i can't think of right now make sure we are not publishing anything within 00:23:26.640 --> 00:23:29.500 secure code cargo audit yes cargo audit 00:23:29.500 --> 00:23:32.260 yeah exactly but that's that's really 00:23:32.260 --> 00:23:35.260 about it we're very stringent on our internal code reviews 00:23:35.260 --> 00:23:40.240 so like i said we have open source projects all of the open source contributions 00:23:40.240 --> 00:23:44.180 that come in go through internal external review as well as internal review 00:23:44.180 --> 00:23:50.060 before they go into the the main branch for Pingora but as far as automated 00:23:50.060 --> 00:23:54.780 tools yeah it's really just clippy a clippy in testing now. 00:23:54.780 --> 00:24:01.040 Let's come back to Pingora for a while we established that we have a system 00:24:01.040 --> 00:24:06.540 called open rusty it's behind NGINX that's the current place we're at when was 00:24:06.540 --> 00:24:10.300 that roundabout like the year that you had that system running in production. 00:24:10.300 --> 00:24:20.060 Oh boy so the blog came about 2022 i want to say the first forays into Pingora started, 00:24:21.420 --> 00:24:30.840 around 2020 or a little before that i would need to look into the exact dates 00:24:30.840 --> 00:24:34.580 for production But I want to say that the service, 00:24:34.580 --> 00:24:37.960 I think it was around, 00:24:37.980 --> 00:24:43.360 it wasn't long after 2020 that these services, 00:24:43.560 --> 00:24:46.940 that the Pingora services first started to get used and deployed. 00:24:48.000 --> 00:24:52.460 Yeah. And pretty early on, you saw some advantages. I guess, 00:24:52.560 --> 00:24:53.900 Edward, you also mentioned that. 00:24:55.340 --> 00:25:00.040 The additional hop didn't really make a big difference because the connection 00:25:00.040 --> 00:25:07.360 to the origin was the bottleneck and the new Rust-based system was already pretty fast. 00:25:07.540 --> 00:25:11.260 But then one could argue NGINX was already plenty fast. 00:25:11.900 --> 00:25:17.380 What were some of the other NGINX limitations that you ran into which kind of 00:25:17.380 --> 00:25:22.000 triggered you to find a different approach other than, say, 00:25:22.280 --> 00:25:27.060 the lack of the type system, for example, that you had with the lure solution in the past. 00:25:27.060 --> 00:25:35.740 Yeah i can definitely speak to that i think i was working on one particular 00:25:35.740 --> 00:25:41.540 feature that was a bit hard to. 00:25:43.360 --> 00:25:46.720 It was i mentioned that every time 00:25:46.720 --> 00:25:49.800 we have we i mentioned 00:25:49.800 --> 00:25:53.380 that we have an internal NGINX fork that we've added 00:25:53.380 --> 00:25:56.880 more and more complexity into for developing our 00:25:56.880 --> 00:26:00.040 own internal features whenever we want to futz around with 00:26:00.040 --> 00:26:03.500 how NGINX does its request 00:26:03.500 --> 00:26:08.320 processing and response serving right over time 00:26:08.320 --> 00:26:11.220 and eventually there was 00:26:11.220 --> 00:26:14.620 a moment at which the the straw 00:26:14.620 --> 00:26:19.060 on the camel's back broke where 00:26:19.060 --> 00:26:22.040 we were implementing we were 00:26:22.040 --> 00:26:24.940 trying to implement more complicated logic on top 00:26:24.940 --> 00:26:28.000 of the things we are already doing for example i 00:26:28.000 --> 00:26:31.180 think there are we've blogged about concurrent 00:26:31.180 --> 00:26:38.500 streaming acceleration which is a fancy name for we're serving your cached request 00:26:38.500 --> 00:26:44.140 cached response body as it gets pulled from the origin it those changes are 00:26:44.140 --> 00:26:48.040 pretty intrusive see changes as we iterated on top of that. 00:26:49.230 --> 00:26:53.010 Any feature of decent complexity would cause core dumps. 00:26:53.750 --> 00:26:59.470 And as I mentioned before, that was highly visible to leadership. 00:27:00.490 --> 00:27:07.030 So if we were to make significant progress at all, we would usually be debugging 00:27:07.030 --> 00:27:12.530 what sort of invariant we were violating inside of NGINX. 00:27:16.090 --> 00:27:19.050 NGINX is great in a lot of ways. 00:27:19.230 --> 00:27:26.790 And it, like the developers themselves are experts in what is valid, you know, 00:27:26.870 --> 00:27:34.770 to access when and what can you do asynchronously from the, from the lifetime of the main request, 00:27:34.890 --> 00:27:37.330 for example, and what is not safe to do so. 00:27:37.330 --> 00:27:42.090 But those things are not necessarily, I mean, they're not as enforced within 00:27:42.090 --> 00:27:45.150 the code strictly, right? 00:27:45.350 --> 00:27:53.030 The way that you can encapsulate those exact kinds of lifetime and memory restrictions in Rust. 00:27:53.030 --> 00:27:56.130 So that was the 00:27:56.130 --> 00:27:59.850 point at which we said 00:27:59.850 --> 00:28:05.050 we were already developing Pingora and then we said actually for any feature 00:28:05.050 --> 00:28:08.990 of significant complexity we need to start moving it into the new system we 00:28:08.990 --> 00:28:14.890 need to start migrating to the new to the new proxy system and developing features 00:28:14.890 --> 00:28:19.190 there as much as possible instead of NGINX itself. 00:28:19.860 --> 00:28:24.100 One thing we want to make clear is we definitely are not here to complain about 00:28:24.100 --> 00:28:26.260 NGINX or to bash NGINX in any way. 00:28:26.500 --> 00:28:31.420 NGINX is like the foundation of Cloudflare. And the actual NGINX and OpenResty 00:28:31.420 --> 00:28:37.000 projects are amazing and stable and used in millions, billions of places. I don't actually know. 00:28:37.360 --> 00:28:43.360 But the modifications we were doing were not as stable and leading to the core dumps. 00:28:43.460 --> 00:28:45.840 As someone who came to this, came 00:28:45.840 --> 00:28:49.280 to Cloudflare not having done internet plumbing, just similar to Edward. 00:28:49.860 --> 00:28:57.360 Seeing the C code for NGINX, which is asynchronous, it's not written in an async 00:28:57.360 --> 00:29:00.860 await kind of way like you're used to with Rust or TypeScript or anything. 00:29:01.020 --> 00:29:04.520 It is literally, it's async code, but you're working with it in the time domain, 00:29:04.680 --> 00:29:08.820 like you are managing the state literally as you go through these, 00:29:08.840 --> 00:29:13.160 waiting for different files, waiting for sockets to open, close. 00:29:13.380 --> 00:29:16.600 So it is a very complex thing and almost impossible to debug. 00:29:17.580 --> 00:29:20.880 Yeah, for sure. the 00:29:20.880 --> 00:29:25.660 the honestly developer ergonomics 00:29:25.660 --> 00:29:28.560 and developer velocity on top of 00:29:28.560 --> 00:29:31.720 the you know the the the 00:29:31.720 --> 00:29:35.420 classes of bugs that you're able to avoid and not 00:29:35.420 --> 00:29:39.520 worry about that avoiding whole 00:29:39.520 --> 00:29:42.820 classes of bugs speeds up your productivity where you 00:29:42.820 --> 00:29:48.320 don't have to worry about introducing those things this is actually why a lot 00:29:48.320 --> 00:29:52.600 of our business logic was written in lua filters as well because you don't you're 00:29:52.600 --> 00:30:00.980 the the amount you're not going to seg vaults from manipulating lua objects right but that, 00:30:01.620 --> 00:30:10.040 often comes at a performance cost with with lua vm and lua runtime even if you lua jit so this. 00:30:11.240 --> 00:30:19.580 The other, like, main primary advantage of switching to Angular and Rust was 00:30:19.580 --> 00:30:22.000 honestly just, like, as Kevin had mentioned, 00:30:22.740 --> 00:30:28.940 the expressiveness of async Rust is extremely powerful. 00:30:28.940 --> 00:30:32.160 And when you're 00:30:32.160 --> 00:30:36.700 looking at especially for onboarding new engineers learning 00:30:36.700 --> 00:30:41.800 NGINX and how it manually handles 00:30:41.800 --> 00:30:49.600 the event loop events right because it when when you're going in when a request 00:30:49.600 --> 00:30:57.120 comes into NGINX it needs it is it is handling those equal events and propagating 00:30:57.120 --> 00:30:59.400 it to the request event handlers, 00:30:59.460 --> 00:31:03.720 and then it needs to decide what comes next. 00:31:04.540 --> 00:31:11.180 Assign the next handlers once your header is done, then you assign the event 00:31:11.180 --> 00:31:12.520 handler for the body, etc. 00:31:12.840 --> 00:31:20.060 There's a lot of manual mental effort involved with that kind of coding model 00:31:20.060 --> 00:31:29.020 where you are both handling the HTTP processing logic in tandem with handling the event loop. 00:31:29.760 --> 00:31:37.700 And with async await constructs, all of that logic then becomes linear, 00:31:37.920 --> 00:31:40.280 actually. You can very much see... 00:31:41.310 --> 00:31:46.330 After this, you're going to do this next in the life of a request. 00:31:46.530 --> 00:31:54.550 And that, I think, I believe, has been really helpful for folks who are, 00:31:54.550 --> 00:31:58.910 for onboarding new engineers, for learning the code base, etc. 00:31:58.910 --> 00:32:07.930 I think it's that that was extremely those ergonomics were just as important 00:32:07.930 --> 00:32:12.210 to us honestly because we need to ship things fast here. 00:32:12.210 --> 00:32:17.390 Sounds super crazy because not many people will be familiar with how engine 00:32:17.390 --> 00:32:23.950 x or to be more specific c handles asynchronous execution, 00:32:24.470 --> 00:32:28.730 That's a thing that was sort of a selling point for NGINX in the beginning. 00:32:28.910 --> 00:32:33.950 It was event-driven in comparison to Apache, which was not very much event-driven. 00:32:34.130 --> 00:32:39.010 It was more or less process-driven, and NGINX kind of changed that model, 00:32:39.030 --> 00:32:42.730 but you kind of need to shoehorn your logic into that. 00:32:44.320 --> 00:32:51.320 But is it similar to the state machine that gets converted to something more 00:32:51.320 --> 00:32:53.320 maintainable on the Rust side? 00:32:53.500 --> 00:32:57.520 So on Rust, we don't really need to write a big state machine ourselves. 00:32:57.800 --> 00:33:03.680 We just use async Rust as we do, and then the compiler will just generate the state machine for us. 00:33:03.840 --> 00:33:09.180 Is the code similar on the C side, or is it completely different? 00:33:10.120 --> 00:33:15.860 Got it, yeah. Yeah, I would say NGINX, 00:33:16.360 --> 00:33:25.620 I guess a lot of that is hand unrolled, the way we were talking about, right? 00:33:25.980 --> 00:33:32.440 Where the events and the next state that you're going to go to for the next 00:33:32.440 --> 00:33:37.020 event that you encounter are manually defined within NGINX. 00:33:37.020 --> 00:33:41.820 And then you were also talking about how NGINX was... 00:33:43.610 --> 00:33:46.570 Really revolutionary in terms of 00:33:46.570 --> 00:33:53.230 how it was doing the asynchronous event driven model and that kind of touches 00:33:53.230 --> 00:33:58.910 on a it's it's not exactly related 00:33:58.910 --> 00:34:06.130