Cloudflare with Edward Wang & Kevin Guthrie
About handling 90 million web requests per second with Rust
2025-10-30 68 min
Description & Show Notes
How do you build a system that handles 90 million requests per second? That's the scale that Cloudflare operates at, processing roughly 25% of all internet traffic through their global network of 330+ edge locations.
In this episode, we talk to Kevin Guthrie and Edward Wang from Cloudflare about Pingora, their open-source Rust-based proxy that replaced nginx across their entire infrastructure. We'll find out why they chose Rust for mission-critical systems handling such massive scale, the technical challenges of replacing battle-tested infrastructure, and the lessons learned from "oxidizing" one of the internet's largest networks.
In this episode, we talk to Kevin Guthrie and Edward Wang from Cloudflare about Pingora, their open-source Rust-based proxy that replaced nginx across their entire infrastructure. We'll find out why they chose Rust for mission-critical systems handling such massive scale, the technical challenges of replacing battle-tested infrastructure, and the lessons learned from "oxidizing" one of the internet's largest networks.
How do you build a system that handles 90 million requests per second? That’s the scale that Cloudflare operates at, processing roughly 25% of all internet traffic through their global network of 330+ edge locations.
In this episode, we talk to Kevin Guthrie and Edward Wang from Cloudflare about Pingora, their open-source Rust-based proxy that replaced nginx across their entire infrastructure. We’ll find out why they chose Rust for mission-critical systems handling such massive scale, the technical challenges of replacing battle-tested infrastructure, and the lessons learned from “oxidizing” one of the internet’s largest networks.
About Cloudflare
Cloudflare is a global network designed to make everything you connect to the Internet secure, private, fast, and reliable. Their network spans 330+ cities worldwide and handles approximately 25% of all internet traffic. Cloudflare provides a range of services including DDoS protection, CDN, DNS, and serverless computing—all built on infrastructure that processes billions of requests every day.
About Kevin Guthrie
Kevin Guthrie is a Software Architect and Principal Distributed Systems Engineer at Cloudflare working on Pingora and the production services built upon it. He specializes in performance optimization at scale. Kevin has deep expertise in building high-performance systems and has contributed to open-source projects that power critical internet infrastructure.
About Edward Wang
Edward Wang is a Systems Engineer at Cloudflare who has been instrumental in developing Pingora, Cloudflare’s Rust-based HTTP proxy framework. He co-authored the announcement of Pingora’s open source release. Edward’s work focuses on performance optimization, security, and building developer-friendly APIs for network programming.
Links From The Episode
- Pingora - Serving 90+ million requests per second (7e12 per day) at Cloudflare
- How we built Pingora - Cloudflare blog post on Pingora’s architecture
- Open sourcing Pingora - Announcement of Pingora’s open source release
- Rust in Production: Oxide - Interview with Steve Klabnik
- Anycast - Routing traffic to the closest point of presence
- Lua - A small, embeddable scripting language
- nginx - The HTTP server and reverse proxy that Pingora replaced
- coredump - File capturing the memory of a running process for debugging
- OpenResty - Extending nginx with Lua
- Oxy - Another proxy developed at Cloudflare in Rust
- Ashley Williams - Famous Rust developer who worked at Cloudflare at one point
- Yuchen Wu - One of the first drivers of Pingora development
- Andrew Hauck - Early driver of Pingora development
- Pingora Peak - The actual mountain in Wyoming where a Cloudflare product manager almost fell off
- shellflip - Graceful process restarter in Rust, used by Pingora
- tableflip - Go library that inspired shellflip
- bytes - Reference-counted byte buffers for Rust
- The Cargo Book: Specifying dependencies from git repositories - Who needs a registry anyway?
- cargo audit - Security vulnerability scanner for Rust dependencies
- epoll - Async I/O API in Linux
- Tokio - The async runtime powering Pingora
- mio - Tokio’s abstraction over epoll and other async I/O OS interfaces
- Noah Kennedy - An actual Tokio expert on the Pingora team
- Rain: Cancelling Async Rust - RustConf 2025 talk with many examples of pitfalls
- foundations - Cloudflare’s foundational crate for Rust project that exposes Tokio internal metrics
- io_uring - Shiny new kernel toy for async I/O
- ThePrimeTime: Cloudflare - Trie Hard - Big Savings On Cloud - “It’s not a millie, it’s not a billie, it’s a trillie”
- valuable - Invaluable crate for introspection of objects for logging and tracing
- bytes - Very foundational crate for reference counted byte buffers
- DashMap - Concurrent HashMap with as little lock contention as possible
- Prossimo - Initiative for memory safety in critical internet infrastructure
- River - Prossimo-funded reverse proxy based on Pingora
- Rustls - Memory-safe TLS implementation in Rust, also funded by Prossimo
- http crate - HTTP types for Rust
- h2 - HTTP/2 implementation in Rust
- hyper - Fast HTTP implementation for Rust
- ClickHouse Rust client - Official Rust client by Paul Loyd
- Pingap - Reverse proxy built on Pingora
- PR: Add Rustls to Pingora - by Harald Gutmann
- PR: Add s2n-tls to Pingora - by Bryan Gilbert
Official Links
Transcript
It's Rust in Production, a podcast about companies who use Rust to shape the
future of infrastructure.
My name is Matthias Endler from corrode and today we talk to Kevin Guthrie and
Edward Wang from Cloudflare about handling 90 million web requests per second with Rust.
Kevin and Edward, thanks so much for taking the time. Can you introduce yourselves
and Cloudflare, the company you work for?
Sure. I'll go first. My name is Kevin Guthrie.
I'm Principal Software Engineer or Systems Engineer at Cloudflare.
I've been here about a year and a half. I've been a Rust developer for about
four-ish years, on and off professionally.
I've done some side projects, some games, some really stupid projects,
some really complex projects.
I just love Rust language. and it's hard to do anything else what about you Edward.
Yeah hey my name is Edward
i'm also a systems engineer here at
cloudflare and i've been working
on rust i mean since i joined the
company essentially about almost five years ago now at this point and i've been
working on well we're going to talk about the Pingora framework today previously
i was working at a game studio so working on internet plumbing essentially was
a pretty big difference now.
Yes we will talk about Pingora today but i'm not sure if people are aware of
the scale that cloudflare is at can you share some numbers just to fill everyone in.
Yeah, okay. So we have some changing data.
This is just based on public data. All the things we're going to share today
are things that are publicly available.
We have about 20% of the internet goes through Cloudflare.
The reference here is from a tweet from one of our engineers.
This is up from a couple years ago when you had,
Steve Klabnik was on and talked about how Cloudflare had 10% of the world's
internet, or 10% of the internet. So we're a little bit up from that.
Currently, from the internal Pingora side, we handle about 90 million requests
per second worldwide, occasionally going up above 100 million requests per second.
That is crazy. I guess beyond comprehension for most people.
That would mean that probably a huge majority of the traffic goes through Cloudflare,
and maybe to some extent through Rust. We will talk about that today.
But what is the setup internally to handle that scale, to handle that amount of requests?
Yeah, I think we have a,
I mean, if you're not familiar with Cloudflare to begin with,
Cloudflare operates a global network rate of more than 300 points of presence around the globe.
There are all these points of presence data centers in many different countries,
and traffic is routed to them via these Anycast addresses, something that we've
talked about on the blog before.
So there are all sorts of setups, both on the layer 4 side and layer 7 side,
to be able to load balance, distribute traffic and capacity accordingly.
Internally, we operate one of the,
and by we, I mean our team operates one of the services that your request travels
through in order to get served in a response.
And those are the services that are using our Angular framework, which is our team.
But internally, yes, there's a bunch of different mechanisms to balance and
distribute the traffic outside of data centers.
External to the data centers, routing to the data centers, and then within our
data centers themselves throughout the life of a request, as we call it.
Traveling through a few different CDN services or proxies.
Some rust some not.
A lot of people who are in the rust community for
quite a while know that cloudflare was one
of the earliest adopters of frost at a
larger scale but what was rarely talked about was the reasoning behind why cloudflare
chose rust can you shed some light on that was it for performance reasons was
it for memory safety reasons what was the big driver behind Rust adoption at Cloudflare?
I think neither Kevin and I were around for the very beginning of this, right?
But we know that our teams have the content delivery network,
you know, compute, workers compute teams, etc.
They've all been eyeing Rust for a long time, as you mentioned.
And I think it was really, yeah, All of the, like, compile time...
Checks that you'd be able to do, all of the classes of bugs that you would essentially
eliminate from production, right?
You would be, I mean, on the content delivery side,
it's no secret that a lot of our company was built on these proxy services using
NGINX, right, which is based in C code, as well as Lua business logic on top of that.
The amount of, let's just say that there were certainly a number of core dumps
and invalid memory accesses associated with us perhaps making changes within our NGINX fork.
Over time, we've had to implement more and more complicated features within NGINX internally.
And these core dumps are really impactful, obviously.
When a core dump happens on a worker process in NGINX, that drops like thousands of requests.
So we had executive, honestly, we had executive visibility and support on that.
But something that our team has talked about before was that at least in the
earlier years, prior to Rust adoption,
for each core dump, I recall that our former CTO, John Graham Cumming,
would actually get an email for each of those.
These crashes were very much top of mind for folks.
So when you have that kind of, if you're able to build something to leverage
all of those advantages of,
hey, I can just completely erase, eliminate these classes of errors,
then you're definitely going to be pursuing something like that.
And Cloudflare is certainly we are not shy to consider every new technology
an advantage we can come by.
The one thing that I noticed when you explained the reasoning behind Rust was
that a big chunk of the business logic was written in Lua.
And I wondered immediately, couldn't you just use another language with a static type system?
Like, I don't know, Go, for example, wouldn't that have been easier to integrate?
Sure, but we were using NGINX, which was relatively new at the time that we were adopting.
It, I believe, we had built a lot of our features, the firewall,
etc., and DDoS features on top of filters that would run in NGINX.
So the OpenResty, I believe, as it's called, the OpenResty is a framework set of.
It allows you to implement business logic that you can plug into each of the
NGINX filters that run across the life of a request without necessarily touching all of the very,
perhaps arcane to a lot of folks, C code.
So in order to do something like integrate Go and stuff, there might be certain
similar efforts to do that.
But I think none have been as mature as OpenResty and its Lua logic and Lua filters.
I think for a while, some of the OpenResty folks were working with us on our CDN teams.
So generally speaking i
think go was one of the possibilities i
believe when we were evaluating other languages right to go to to switch to
but i think rust is the definite forerunner for all the reasons that i mentioned
a you know zero cost abstractions for great performance and And obviously,
most importantly, I think,
eliminating all sorts of memory safety issues and bugs that can arise from memory safety issues.
One other thing that impacted our decision to go to Rust, I think.
Like I said, I've only been here a year and a half, so I was definitely not around when the
decision was being made.
But we had a lot of the forerunners, like the celebrities and the Rust community,
were working at Cloudflare at various points in time.
Like Steve Klabnik himself worked at Cloudflare. Ashley Williams also worked here.
So, I mean, there was a lot of popularity of the Rust language in Cloudflare to begin with.
When you want to integrate a language like Go into an existing infrastructure
that runs on NGINX, that would probably be a little harder because Go has a
small but not negligible runtime.
It has a garbage collector and so on. Whereas with Rust, you could integrate
very deeply with basic C FFI.
Did that also play a big role? And also, did you end up integrating Rust into
your NGINX server for a while before you moved on to build your own solution?
Yeah, I think this speaks to the matters of how do you migrate and switch over
pieces of your infrastructure gradually to Rust as well, right?
So all of the, as I mentioned, a lot of the core business logic historically
has been built on Lua via OpenResty.
All of that, you know, business logic built up over time.
So I think there were initially some notions of how do you integrate,
how do you maybe integrate,
change those filters to how do you extract the business logic, right?
To use, to be using your Rust-based logic instead.
And there, I think we're varying approaches to this for, there are a lot of
teams that work on the CDN in addition to us.
Some of these, some of the logic you can kind of extract into different services,
either in-band with the request processing or out-of-band.
You can make calls to other services.
For example, the approach that we ended up choosing was to extract a specific, on a high level,
extract a particular responsibility of one of our NGINX proxies into a separate service.
That was what we were doing when we were first developing Pingora,
which was at the time, NGINX would reach out to make origin connections directly
and make origin requests directly.
We decided, hey, what if we situated an in-band with a request proxy that sits
just behind that NGINX proxy? and routed requests to that instead.
And then that service would decide how to make origin requests to which origins.
Handling all of the origin communication responsibility.
And then you're able to do something like divert traffic to that selectively
depending on how ready that service is to handle certain classes of requests.
I think this is generally the strategy that has been working out pretty well
for various services at Cloudflare,
as long as you're able to have something that has some sort of control plane that sits in front and,
decides, you know, the routing in band of that request.
It's the unsurprising answer of how do you solve a problem at a proxy company
is by adding more proxies.
Yeah. And certainly at first, I think we were a little concerned about,
I think whenever you're adding another hop, it's another proxy hop at injecting
another service into it.
You're worried about complexity. You're worried about performance regressions
and things like that, latency, obviously, right?
Generally speaking, what we had noticed was that adding another service hop is
certainly unconditionally going to add some amount of latency.
Thankfully the feature the
new logic that we were adding on top
of that was generally able to offset a lot of any those detriments that you
would face the the example that we tended to point out in our in our blog a
while ago was how our Pingora service i don't know how much we want to get into,
the why exactly in terms
of like NGINX versus Pingora architecture and stuff
but Pingora was a lot more
efficient this was definitely top of mind for us
a lot more efficient in terms of how it was making and
reusing origin connections and so
something like that generally brings down your you
know the the the latency of making an origin
request significantly if you can skip all of the TLS handshake etc latency fortunately
for us as well when it comes to replacing an origin facing proxy the the cost
of the cost of origin latency significantly dwarfs any additional proxy hop latency you have so.
Okay, but was the project already
called Pingora back then, or was it some sort of intermediate step?
I guess I have to shout out some folks who were working on Pingora.
I say I was working on it, but in reality, it was Yuchen, Yuchen Wu,
as well as Andrew Houck, who were kind of the primary and first drivers of Pingora.
And at first it was called OpenRusty, I think.
You still see this term in some of the old tests because it was very much meant
to replace OpenResty and NGINX itself and be a, I don't want to say a drop-in replacement,
but do all the things and model a lot of its logic off of NGINX and OpenResty.
Because honestly, that worked for us. and NGINX's logic models and the way that
it thought about request processing worked for us.
So we wanted to do a lot of things pretty similarly to NGINX.
I really like the name, OpenRusty, but of course.
I don't remember why they didn't go with that name in the end.
I think it partially was sort of pejorative, but also could have been confused as a typo.
I think Pingora is a much better name. The name, I think, came from the manager
of the team who almost slipped and died off of the mountain,
the literal mountain that's actually called Pingora.
I believe the story is that a particular trip to the Pingora Mountain almost cost him his life.
And now we've been ascending that summit ever since.
Sounds like negative foreshadowing, but in reality it worked out well.
But the one thing that I wondered was that, let's say you add another hop.
You add proxy behind a proxy.
And then you have a ton of requests coming in and then you want to switch to
a new version you just kind of want to do a release basically wouldn't that
be an easy source for dropping connections and dropping requests it.
Is so it's something we have to do very carefully and we have since we handle
things like web sockets we have lots of long-running requests so upgrades updates
are something we don't do very often now or then.
The way Pingora does this is a really slick system where when you bring up a
new update, the process that you want to move everything to, it can start.
It can know about the old instance of Pingora that's currently running.
And that old instance can gracefully hand over the socket to start listening
for new connections on the new instance of Pingora while old requests finish
out on the old one and then the old instance can handle all of its requests
and then gracefully shut down
whereas the new one is bringing up any new connections and handling those.
Is that safe or does that happen between processes? I wonder if you can even make it safe.
It is. I mean, I don't know. I'm sure in Rust this is classified as some form
of unsafe code because you're passing around raw file descriptors for sockets.
But it is also a really common thing. This is something I first heard of at Facebook,
where their networking, their HTTP servers do the same exact thing or even their
load balancing system i think it's a very common process but i never worked
with the actual code to do it until working on the Pingora project.
Yeah there's actually so yeah there's there's this process of transferring these
listen file descriptors i think it's one of the few places,
i i could be wrong but i think it's one of the few places where yes because
we're dealing with those raw file descriptors, there's a bit of unsafe code there.
There's actually also a crate that I believe we've put out, not us ourselves, but I mentioned,
or maybe I haven't mentioned yet that Cloudflare is not in a monolith when it
comes to Rust, and we are not the only folks,
developing in the Rust ecosystem.
So the folks who are working on another proxy
framework called oxy have actually open sourced a
crate that is specifically for these kinds of graceful process restarts and
it's called shellflip so that it uses a very you know similar mechanism of you
know transferring while descriptors and doing that handover of of yes doing that handover.
ShellFlip sounds like another really cool name. You have a way with names, I guess, at Cloudflare.
I think it was taken from the TableFlipGo package.
And I think some of our engineers decided to...
That's a good name.
I don't know why it's Shell in particular, but maybe it has to do with crabs.
Maybe, yeah. But do you share a lot of code with other teams at Cloudflare?
Rust creates, that is. You mentioned that you have Pingora and you have Oxy,
but there's probably more stuff at Cloudflare which uses Rust.
How does code sharing look like?
Because from another company, or actually from a few, I heard that sort of by
serendipity, they start to use different crates in completely different contexts.
And it kind of happens very naturally to share code.
Yeah, that's true. We have a sort of a haphazard way of sharing code.
We do have our own internal repository for uploading crates,
like it's an internal copy of crates.io.
Internal registry.
Sorry, internal registry. That's the right terminology. But for a lot of it,
it's done through referencing crates through Git URLs.
So it's a little bit on the Go side of things. So you have a crate that you
want to share with other people in Cloudflare.
It's up on our internal Git server. You can write a blog post about it.
Anybody can just include it.
Putting it on the internal registry should be a more common thing to do,
but I have literally never done it.
I've shared a couple of crates with different teams for various stupid things,
but most of those are just incorporated either through one way,
which is making the project open source and then having people consume it just
from the open internet or from the actual crates.io or consuming it internally
from an internal Git repo.
I will say that I think usage of the internal registries is pretty common now.
The whole point of the registry, One of the major points was to avoid that the
kit commits kind of references that Argo allows you to do.
It's still used in some cases, right?
But yeah, more and more, I think the ecosystem around Rust has become a lot
more shared maybe in the past few years. In our code, we use both approaches.
And when you publish code, do you have a formal process for the publication?
Do you run any cargo tools to make sure that the code quality is on par with the rest?
Uh we use the i mean really we
use the standard open source tools we use
clippy we use the auditing tool
name i can't think of right now make sure we are not publishing anything within
secure code cargo audit yes cargo audit
yeah exactly but that's that's really
about it we're very stringent on our internal code reviews
so like i said we have open source projects all of the open source contributions
that come in go through internal external review as well as internal review
before they go into the the main branch for Pingora but as far as automated
tools yeah it's really just clippy a clippy in testing now.
Let's come back to Pingora for a while we established that we have a system
called open rusty it's behind NGINX that's the current place we're at when was
that roundabout like the year that you had that system running in production.
Oh boy so the blog came about 2022 i want to say the first forays into Pingora started,
around 2020 or a little before that i would need to look into the exact dates
for production But I want to say that the service,
I think it was around,
it wasn't long after 2020 that these services,
that the Pingora services first started to get used and deployed.
Yeah. And pretty early on, you saw some advantages. I guess,
Edward, you also mentioned that.
The additional hop didn't really make a big difference because the connection
to the origin was the bottleneck and the new Rust-based system was already pretty fast.
But then one could argue NGINX was already plenty fast.
What were some of the other NGINX limitations that you ran into which kind of
triggered you to find a different approach other than, say,
the lack of the type system, for example, that you had with the lure solution in the past.
Yeah i can definitely speak to that i think i was working on one particular
feature that was a bit hard to.
It was i mentioned that every time
we have we i mentioned
that we have an internal NGINX fork that we've added
more and more complexity into for developing our
own internal features whenever we want to futz around with
how NGINX does its request
processing and response serving right over time
and eventually there was
a moment at which the the straw
on the camel's back broke where
we were implementing we were
trying to implement more complicated logic on top
of the things we are already doing for example i
think there are we've blogged about concurrent
streaming acceleration which is a fancy name for we're serving your cached request
cached response body as it gets pulled from the origin it those changes are
pretty intrusive see changes as we iterated on top of that.
Any feature of decent complexity would cause core dumps.
And as I mentioned before, that was highly visible to leadership.
So if we were to make significant progress at all, we would usually be debugging
what sort of invariant we were violating inside of NGINX.
NGINX is great in a lot of ways.
And it, like the developers themselves are experts in what is valid, you know,
to access when and what can you do asynchronously from the, from the lifetime of the main request,
for example, and what is not safe to do so.
But those things are not necessarily, I mean, they're not as enforced within
the code strictly, right?
The way that you can encapsulate those exact kinds of lifetime and memory restrictions in Rust.
So that was the
point at which we said
we were already developing Pingora and then we said actually for any feature
of significant complexity we need to start moving it into the new system we
need to start migrating to the new to the new proxy system and developing features
there as much as possible instead of NGINX itself.
One thing we want to make clear is we definitely are not here to complain about
NGINX or to bash NGINX in any way.
NGINX is like the foundation of Cloudflare. And the actual NGINX and OpenResty
projects are amazing and stable and used in millions, billions of places. I don't actually know.
But the modifications we were doing were not as stable and leading to the core dumps.
As someone who came to this, came
to Cloudflare not having done internet plumbing, just similar to Edward.
Seeing the C code for NGINX, which is asynchronous, it's not written in an async
await kind of way like you're used to with Rust or TypeScript or anything.
It is literally, it's async code, but you're working with it in the time domain,
like you are managing the state literally as you go through these,
waiting for different files, waiting for sockets to open, close.
So it is a very complex thing and almost impossible to debug.
Yeah, for sure. the
the honestly developer ergonomics
and developer velocity on top of
the you know the the the
classes of bugs that you're able to avoid and not
worry about that avoiding whole
classes of bugs speeds up your productivity where you
don't have to worry about introducing those things this is actually why a lot
of our business logic was written in lua filters as well because you don't you're
the the amount you're not going to seg vaults from manipulating lua objects right but that,
often comes at a performance cost with with lua vm and lua runtime even if you lua jit so this.
The other, like, main primary advantage of switching to Angular and Rust was
honestly just, like, as Kevin had mentioned,
the expressiveness of async Rust is extremely powerful.
And when you're
looking at especially for onboarding new engineers learning
NGINX and how it manually handles
the event loop events right because it when when you're going in when a request
comes into NGINX it needs it is it is handling those equal events and propagating
it to the request event handlers,
and then it needs to decide what comes next.
Assign the next handlers once your header is done, then you assign the event
handler for the body, etc.
There's a lot of manual mental effort involved with that kind of coding model
where you are both handling the HTTP processing logic in tandem with handling the event loop.
And with async await constructs, all of that logic then becomes linear,
actually. You can very much see...
After this, you're going to do this next in the life of a request.
And that, I think, I believe, has been really helpful for folks who are,
for onboarding new engineers, for learning the code base, etc.
I think it's that that was extremely those ergonomics were just as important
to us honestly because we need to ship things fast here.
Sounds super crazy because not many people will be familiar with how engine
x or to be more specific c handles asynchronous execution,
That's a thing that was sort of a selling point for NGINX in the beginning.
It was event-driven in comparison to Apache, which was not very much event-driven.
It was more or less process-driven, and NGINX kind of changed that model,
but you kind of need to shoehorn your logic into that.
But is it similar to the state machine that gets converted to something more
maintainable on the Rust side?
So on Rust, we don't really need to write a big state machine ourselves.
We just use async Rust as we do, and then the compiler will just generate the state machine for us.
Is the code similar on the C side, or is it completely different?
Got it, yeah. Yeah, I would say NGINX,
I guess a lot of that is hand unrolled, the way we were talking about, right?
Where the events and the next state that you're going to go to for the next
event that you encounter are manually defined within NGINX.
And then you were also talking about how NGINX was...
Really revolutionary in terms of
how it was doing the asynchronous event driven model and that kind of touches
on a it's it's not exactly related
to how you know async rust gets kind of converted into a state machine,
but it is something that it's a big explanation for why NGINX is already a great performer,
right? Does really well in benchmarks already.
The underlying mechanisms for how that works, I talked about something called ePoll before.
The underlying mechanisms of NGINX
are such that there's an event
loop that an NGINX worker process goes
through and it's able to
with the help of operating system utilities
like epol determine when io
events are ready on certain file descriptors
and otherwise block if if
there are no events ready there but then so
you can think of it as essentially processing all of
these events as they become ready
as as io events come in on in
this you know literal loop and the equivalent like that that's the that was
a really powerful and greatly you know efficient way to to do all of this and respond to,
network and file IO, right?
We got to cheat with Pingora because we are able to reap the benefits of tokio
that does something really similar.
It has a, and I am not a tokio expert by any means, I would say,
but it is doing something really similar also with the help of Mio underlying, right,
Metal.io, where it's also handling the file descriptors and this event loop, etc.
Within its reactor that is using a lot of the same operating system mechanisms,
be it EPOL or KQ or what have you, to listen to these IO events and then propagate them,
awake the corresponding tokio tasks that are relying on them, right?
So all of a sudden, we are able to look at that as Bangora and we have a great abstraction,
you know, from the actual underlying event handling systems that we can build upon.
And so a lot of, I would say that, like.
Great big success when it comes
to why we were able to develop Pingora relatively quickly, I would say,
is because we were able to build upon the success of tokio and all of its great
performance considerations and mechanisms, right?
Because tokio also has a bunch
of internal optimizations for especially
when it comes to well we can get into how
it does things between threads etc and and tries to load balance load balance
tasks and work between threads but it's basically we were able to do so much
on top of because we already had a great underlying async runtime and event handling mechanism.
I don't know if you sold, you said you weren't a tokio expert.
I don't know if that came across because it sounds like you are quite the tokio expert now.
I can say as a definite non-tokio expert that you don't really even need to
know all this to use tokio as an async code.
You can come into it as a TypeScript developer and be like, oh yeah,
it's async, wait, I get it.
We have an actual tokio maintainer on our team now, Noah Kennedy,
who is actually a tokio expert.
So I don't generally say I'm an expert on things unless I really feel like I
know them very, very, very well.
Well also the error messages have
gotten a lot better in recent years i can still
remember the early days when you got
a page full of you know
gibberish about the type system and so on but recently they
improved it a lot thanks to compiler internals
which helped infer what really went wrong and also present the information in
a more consumable way and also just new mechanisms inside of the components
of like Impultrate which allow you to focus more on the core issue at hand.
I don't know if you saw any of the older error messages before but.
Definitely yeah yeah so i
i did some work with uh with tokio a long
time ago like before before like well before i went
1.0 so things were a little bit different than in terms of error messages so
like it was basically back in the old days of dealing with java error messages
or other languages error messages where you can almost ignore them just look
at where in the code it was pointing and go there and try to figure things out yourself,
as opposed to right now, where the error messages in async are practically as
good as they are in synchronous Rust code, which is to say really good.
Did you ever run into problems with tokio, like, for example,
starving the executor on threads when you block too many futures or async cancellation
where you had a sub-request that you didn't want to kill when you killed main
task, for example, and you wanted to keep it running?
And the Rust Futures ecosystem was sort of getting into your way?
At least a little bit. There was one scenario that I ran into within the past
month where I was downloading a file, downloading a large file,
running on a hyperscaler VM, so really excellent internet connection,
and just the process of downloading that file on a small VM with a limited number
of threads and a small number of tokio workers was enough to starve every other
connection because I wasn't being smart. I wasn't using budgeting.
I was just letting this one task take over the entire runtime and blocking everything else.
Like no other async things were being taken care of. It was just happily downloading
this one giant file and using the entire CPU to do that.
Yeah, what happens a lot during testing is that people have these multi-core
machines, they have, I don't know, 16, 32,
64 cores, what have you, and they run that system in their test laptop in their
development environment.
And everything works because you have plenty of threads.
By default, tokio just spawns as many background threads as you have cores.
But then you move to production where maybe you have to make do with two cores
and then suddenly you have two blocking threads,
tasks and then you ran
out of threats and kind of the
threat pool got exhausted that's a thing at least that i
saw with some clients and you running
20 of the internet i did wonder if you ever run and ran into these sort of problems
where these might be hard to troubleshoot as well because you look at a dashboard
or so and everything looks normal as if it was sort of working but it doesn't
do any work it doesn't make any progress on the futures.
That's true we have recently at least our team recently the ping core team has
incorporated tokio internal metrics into our dashboards to give us visibility
into these sort of things but you're right that's something before like the
past couple months that we didn't have visibility into,
and if we were running into that problem we wouldn't have known since then we've
encountered a few problems in production that we've seen in in our like measurements
of runtime queue size and,
i've got the other metrics but basically how much is the the scheduling system
getting backed up by all of the all of the threads being busy i.
Think we were also running into issues where we
had certain file io operations that were taking while and that for for those
tokio actually has like a separate blocking thread pool that usually doesn't
get saturated because it's really large, but may, right?
And on the note of, I would say you also brought up async cancellation as well.
I think it's really easy to mess up.
Async, you know, thoughts about cancel safety and things.
It's really easy to mess up a while tokio select loop so that you,
if any of the branches aren't necessarily cancel safe, that's not something
that you, that's purely an async Rust problem that you're, that generally I
think doesn't get introduced very well for other people who are entering Rust.
There was a great Rust talk actually by rain
from oxide that i would love to shout out because
it was really helpful in thinking about why exactly
async cancellation is is hard to reason about it's
because there's i i there are a lot of ways in which cleaning up async rust
being mindful of cancel safety being mindful of when a future is canceled and
thus cancels everything else under it, right?
That's both really, really useful in async Rust, but also very,
very easy to mess up. And it's a problem.
They had introduced this concept, which I was really...
Which helped me think about this, right? It is a problem that you can't determine
whether or not something is safe to cancel just within the function itself.
You have to look at every other child future under it and determine what else
is going to get canceled when I decide to cancel this.
And so it becomes a really hard problem to think about because suddenly you
have to think about everything else, you know, all of the global context.
This is structured futures?
I think it's just because when a future, like child futures,
that will also get canceled, right?
When you cancel the parent future, I believe.
So it's tough.
And so async Rust definitely has its sharp edges.
That's not necessarily a tokio-specific thing, but the last bit,
the very last bit, is that actually there is, to shout out another Cloudflare
crate that we are not yet using in Pingora either,
but other services at Cloudflare are.
There's a crate called Foundations that I think helps you export a lot of these
tokio metrics and such out of the box.
So it has a lot of nice functionality to be able to do that.
And it should be a pretty minimal, again,
foundation layer for folks if you're interested in more easily exposing a lot
of those runtime operational concerns and getting observability into that.
One thing to build on that, the one thing that Pingora does to help you avoid
the problem of going from one machine with many cores to another machine with
few cores, the problem about unexpected number of tokio tasks,
is to make you be explicit.
Like well Pingora uses tokio under
the hood it doesn't really expose the it
doesn't really expose tokio to the the caller instead
it talks about things in terms of backends how many how
many threads do you want to use on this back end and we don't do
a default number we don't default to the
number of cores we make you be explicit to say okay you want
to run this tell us how many tasks you want to run and we do a lot of things
like isolating services to a certain subset of tasks it's not like one giant
tokio runtime which you would get if you were just running tokio main it's a
good way of isolating business critical things from things that need to run
in the background that can take a little extra time.
Yeah and since we're on the
topic of using a certain number of
cores in rust code i always wonder why everyone sort
of defaulted to the number of cores that you have on your system
because if every dependency if every library does that
you end up with a multiple a multiple of
the number of cores you have and i'm just
saying this that so that people are mindful
about the resources that they request from a system and speaking of resources
that's the other part about performance or efficiency let's say that i wondered
about when you compare NGINX with Pingora did you or were you able to squeeze out even more,
requests per server now that you switched over to Pingora because NGINX must
have already handled a ton of requests I'm assuming because it was written in C.
I don't know if we were able to squeeze out more necessarily,
like people squeeze out all of the resources from us with the amount of requests
per second that they drop on our network.
But I think, again, I think the resources that we are really saving,
as far as I can recall, because
NGINX you know just bare request
processing without you know extra compute
futzing with the request processing with
lua filters or whatnot is generally already pretty efficient and
just trying to limit what it does to being pipes we we have similar goals right
We want to be as minimal as we can and just ferrying the bytes through and making
the necessary modifications on the layer seven stuff.
Now, the things that we were saving, I had mentioned earlier that we were like.
More efficient at reusing origin
connections for example you can in theory
i guess squeeze out yeah you
can you can squeeze out and save compute
on making for both yourself and the
origin right if you're if you have better
origin connection reuse when
you're trying to make requests upstream the reason
why we there was
such a fundamental difference i think that in in
the blog i think we had mentioned that we had reduced we had
lowered it by like
two-thirds the amount of origin connections we
were making the fundamental reason for that was like a fundamental
architecture reason which was
that NGINX worker processes right
because their individual processes weren't able
to share a connection pool unlike the thread-based model that we have in in
Pingora so that and where where we have an upstream connection pool that that
all the threats can on a particular server can share from.
Except in those particular design, fundamental architecture ways that we were
really conscious of when we were first optimizing for Pinguara because our team
making origin connections, that's a big deal for us.
I think generally we would expect performance to be pretty much on par, right?
With what Entirenex is doing. And that's the promise of Rust, right? You can do that.
And be just as expressive and easy to understand.
Yeah, a lot of this comes down to physical limits. So NGINX is optimized to
the max to the level of what you can do on a network card and what you can do
reading files from a disk. We are limited by the same physical constraints.
We are reading from the same network, reading from the same disk effectively.
The place where Rust excels here is an ability to make it easy to read and easy
to write and easy to onboard as opposed to requiring a PhD to unroll C code.
And when it comes to like implementing new,
playing with the shiny new tools that the kernel allows you to,
like uring and stuff, that's perhaps a lot easier to,
like I would certainly want to be doing that within our framework instead of
trying to roll that into NGINX, right?
That's at this point i think it's just we we are a lot more comfortable working with our ecosystem.
Which brings us to today and just to wrap up the part about Pingora can you
share some numbers about the project where are we at today and maybe about cloudflare in general.
Sure. So, I mean, the first thing that I always tell people about Pingora or
Cloudflare in general is that the teams are really small,
surprisingly small, especially if you look at teams at other big companies like
Amazon or Facebook or Google, there's only between six and eight people,
depending on the time of day on the Pingora team.
So this team that handles a large 20% of the internet traffic in the world,
handled by six or seven people, most of whom are asleep at the same time.
In terms of lines of code, we, for some reason, are not giving out the official
number of lines of code in Cloudflare that is written in Rust.
But for Pingora, even on the open source side, there's about 130,000 lines of code.
To be clear we're not the only content
delivery network team not the
only a proxy service through which you know
these these requests are passing through there there are lots of other folks
but yeah i think that where it makes sense the the engineering teams are are
definitely we have a lot of autonomy each of us as engineers and a lot certainly
a lot of responsibility and we're kind of each, you know,
driven to do what we want within the team.
So each of us carries, I think, a lot of load without trying not to stress the
bus factor, though, in that case.
True. Yeah, the reason the team size fluctuates is because we, as a company,
are open to working cross teams to the extent that for the past,
I don't know, three or four months, I've not been working on the Pingora team
and been working on the Speed team for other undisclosed yet projects,
but are also written in Rust, more on the core side than the edge side,
but still very interesting and all async Rust, just like Pingora.
Amazing.
Though I don't think we can share across all of Cloudflare how many lines of
Rust code there are, I will say that Rust, we mentioned that Rust has been of
interest to Cloudflare for a long time.
Pretty much every new service on the edge is written in Rust,
I believe, unless there's some significant reason not to.
I think all of the services that are running are proof enough that it provides
significant value, especially in our performance-critical, like,
segfault-avoidant environment.
Yeah, there are at least a few requests that go through Cloudflare that touch only Rust.
It's not the majority yet, but it is a significant number.
Speaking of which, have we talked about the number of requests per second that
Pingora handles right now?
Oh, yeah, I mentioned it briefly in a ramble. But yeah, so Pingora itself,
there are multiple Pingora projects.
But the one most prevalent that talks to upstream origins handles on average
about 90 million requests a second.
There was a blog post that came out that the Primagen read out loud.
And he got to that number and said, wow, is that a billy? no that's a trillion
that's a trillion requests per day.
Wow that's crazy that's a lot of requests.
Yeah.
Does the Rust ecosystem cover everything that you need right now?
Are there any crates that you wanted to mention that are amazing,
that are invaluable for you?
And are there any things lacking in the ecosystem right now?
One crate that I think is sort of underutilized is the valuable crate.
As part of the tokio project, it ties in really nicely with tokio tracing.
It allows you to basically give a controllable summary of objects that you want
to show up in your traces.
It's got a usage pattern that's similar to serde. You annotate your structs
that you want to be able to display.
It's got some great new features that allow you to omit fields if you don't
want PII to show up in your traces. It's really well written.
We've added our own features on top of it that allow you to do things like when
you have structures from external crates that you don't have access to add annotations to them.
You can give it a special valuable annotation to instead of giving a full object
representation of this structure, you can give it the debug representation or
the display representation and have that show up in your logs.
It's just a really simple way of avoiding the boilerplate that comes up with
wanting to give a summary of an entire object structure, which I've seen in
lots and lots of places, especially in other languages.
You want an object to show up in multiple ways, but you can't interfere with
how it's serialized to json so you have to go through all the boiler plate of
writing okay this field goes in oh no skip this field it's got ip addresses,
it's having a crate that's designed to do this and also is thoughtful,
Because it doesn't add overhead of implementing. You're implementing it as a
blanket trait implementation, but it's done in a dynamic way.
So it doesn't add even a lot of monomorphism.
It just gives you one implementation for anything that implements display or is valuable.
It's a great crate. I can't get enough of it.
One could say it's invaluable.
It is invaluable, yes.
No, yeah, I'm glad you mentioned that because I feel like I'm cheating.
Everything that comes to mind is a core dependency, right?
Like, tokio, obviously, has so many great utilities for us to express,
like message passing and in async fashion, etc.
And obviously we've extolled its, I think we've sung its praises.
And other things that come to mind seem really foundational,
just like, you know, reference counted bytes, byte buffers with the bytes crate.
Very foundational. dash map
how do you get a concurrent hash map
with as little lock contention as
possible something like dash map with a bunch of shards it's it's great oh the
the other things i've already mentioned as well which is our you know our cloudflare
crates that i've already mentioned before shellflip when it comes to process restart,
graceful process restarts and foundations for various telemetry and observability
things among other operational service things.
I don't know if we wanted to shout out like some community work on top of Pingora 2.
Oh, yeah.
Like there are originally, I think we had been working with some folks within
the Proximo memory safety org on a more batteries included actual drop in NGINX
replacement called River.
I do believe that a lot of that work maybe is on pause right now, though.
But there are a lot of other great community folks who come in,
report issues, etc., contribute, who are working on, I think,
there's this ping gap crate as well, is one of the most significant and popular,
where they've also implemented all of other, dealt with our more arcane APIs
around caching and stuff.
So definitely, that's a tremendous effort.
And I think we have been so flattered and excited by the community engagement with Pingora.
It was monumental and humbling.
What does River do?
It is both of these projects that I mentioned, River and PingGap,
are meant to be more batteries included, NGINX, like actual binary deployment.
So Pingora is meant to be a library,
and it can be a bit difficult to work with if all you're trying to do is use
it as a drop-in for NGINX, right?
You have to actually implement all of, you know, define the proxy service and
things like that in code.
And it is
not a batteries included like plug-and-play sort of deployment where you can
just versus something like one of these other projects where you can in theory
just build it and run it as if it were an NGINX binary.
So we were really trying to build the foundations of a lot of this, of a proxy framework.
And allow the community to expand on it since we don't necessarily haven't yet
necessarily needed that generalized solution ourselves with the amount of heavy
customization and heavy like...
You know, surface fiddly bits that we do ourselves in spinning up a Pingora service.
Yeah. And as we mentioned, we're only six or seven people. We don't have so
much time to add additional features. I mean, we love adding features to Pingora.
The River project, when it was envisioned, was supposed to have things like
WebAssembly integration. So you can do all these things, but expose them as WebAssembly.
That was one of those things that I would love to implement myself.
But, you know, there's just, there's enough Cloudflare work to go around,
and also it's a significant project to take on.
The community has been really good at putting things into Pingora directly, though.
Some notable ones that come to mind are Russell's integration.
We internally use OpenSSL.
The Russell's integration was a huge undertaking that one person did themselves,
and we're very grateful to that. Harold, if you're listening, thank you very much.
There's another similar integration for another TLS implementation, I think the AWS 2SN TLS.
That one is still yet to be reviewed, obviously is assigned to me.
I'm slacking off on my open source job there.
We really try to, we are really trying to stay on top of open source,
but there's only sometimes, I wish I just had more.
I think we all wish we had just more.
Open source time. Yeah, the open source stuff is so fun.
Yeah there's never enough time speaking of
which we have to conclude as well because we ran out of time but it was amazing
to talk to both of you likewise if you could phrase a statement to the rust
community anything that you always wanted to share what would it be yeah.
No like we there's a bunch of http ecosystem things that there's There's a great
maintainer for all of it is like open source, the HTTP,
literally HTTP crate, h2, you know, a lot of those are our core dependencies for Pingora as well.
And the maintainer, Sean, is incredible at what he does.
Yeah, I agree. Shout out to Sean.
Yeah, the thing I was going to thank the Rust community for is for being so coherent,
especially around HTTP things like the hyper ecosystem, the h2,
all of those things are so ubiquitous that it makes integrating with existing projects much easier.
Specifically, I was working with a ClickHouse client that is an official ClickHouse
client that the ClickHouse team puts out, but I needed to add a new feature
for rotating MTLS certificates, which obviously their client does not support.
But because they expose access to the hyper HTTP client under the hood,
it made it an easy thing to do.
It's just such a good experience to come to.
Like if you need a feature you already have the tools necessary to add functionality
to tools that are published by other people in a coherent way something that
you don't get in java something that you i don't know if you get in go that's
not my ecosystem but as a former and recovering java programmer it's very nice yeah.
That's a very nice closing statement as well Edward anything that you want to add.
I'm glad you had a specific answer because really, I am mainly just thankful
for, I mean, it is true that the ecosystem,
though I'm sure there are gaps from time to time, generally,
if you are looking for a particular pattern or thing,
you will either find out that it is hard to do so, or that someone else has
already tried to at least some extent to do it and has a working very much like
you know if not production ready nearly production ready implementation of it,
so the rust ecosystem in general has
has just been kind of the amount of excitement
that folks have within the
community is is a great sign
of sign of promise and i mean obviously i
think rust has already eaten up a lot
of the internet if we are
anything to if we are a good example but no we're we're just so once again just
so thankful that people are interested in what we do and are patient with us
and are you know are great contributors so.
Kevin and Edward thanks so much for taking time for the interview today.
Thanks Matthias we appreciate it.
Thank you yes yeah.
I mean thanks for putting on this podcast Yes, I cannot believe it took five
seasons for me to catch on.
It's never too late. Rust in Production is a podcast by corrode.
It is hosted by me, Matthias Endler, and produced by Simon Brüggen.
For show notes, transcripts, and to learn more about how we can help your company
make the most of Rust, visit corrode.dev.
Thanks for listening to Rust in Production.
Kevin
00:00:26
Edward
00:00:53
Matthias
00:01:25
Kevin
00:01:38
Matthias
00:02:21
Edward
00:02:45
Kevin
00:04:24
Matthias
00:04:26
Edward
00:04:56
Matthias
00:07:43
Edward
00:08:04
Kevin
00:09:59
Matthias
00:10:26
Edward
00:11:01
Kevin
00:13:46
Edward
00:13:55
Matthias
00:15:39
Edward
00:15:47
Matthias
00:16:45
Kevin
00:16:49
Edward
00:17:10
Matthias
00:17:26
Kevin
00:17:54
Matthias
00:18:44
Kevin
00:18:51
Edward
00:19:21
Matthias
00:20:29
Edward
00:20:36
Kevin
00:20:44
Edward
00:20:45
Matthias
00:20:50
Kevin
00:21:22
Edward
00:21:33
Kevin
00:21:34
Edward
00:22:21
Matthias
00:23:02
Kevin
00:23:14
Matthias
00:23:54
Edward
00:24:10
Matthias
00:24:48
Edward
00:25:27
Kevin
00:28:19
Edward
00:29:17
Matthias
00:32:12
Edward
00:33:10
Kevin
00:37:39
Edward
00:38:00
Matthias
00:38:17
Kevin
00:38:58
Matthias
00:39:32
Kevin
00:39:58
Matthias
00:40:39
Kevin
00:41:43
Edward
00:42:23
Kevin
00:44:35
Edward
00:44:37
Kevin
00:45:42
Matthias
00:46:39
Edward
00:47:34
Kevin
00:50:32
Edward
00:50:59
Matthias
00:51:29
Kevin
00:51:45
Edward
00:52:29
Kevin
00:53:10
Matthias
00:53:38
Edward
00:53:40
Kevin
00:54:26
Matthias
00:54:35
Kevin
00:54:42
Matthias
00:55:09
Kevin
00:55:13
Matthias
00:55:14
Kevin
00:55:29
Matthias
00:57:14
Kevin
00:57:16
Edward
00:57:22
Kevin
00:58:53
Edward
00:58:55
Matthias
01:00:08
Edward
01:00:10
Kevin
01:01:40
Edward
01:02:43
Kevin
01:02:58
Matthias
01:03:01
Edward
01:03:23
Matthias
01:03:50
Kevin
01:03:54
Matthias
01:04:51
Edward
01:04:56
Matthias
01:06:13
Kevin
01:06:17
Edward
01:06:19
Kevin
01:06:21
Matthias
01:06:26