Rust in Production

Matthias Endler

Rust in Production Ep 2 - PubNub's Stephen Blum

PubNub's CTO Stephen Blum discusses how implementing Rust improved memory and performance compared to their C and Python implementation. They highlight Rust's versatility, while emphasizing low latency and the importance of code simplicity.

2023-12-28 57 min Season 1 Episode 2

Description & Show Notes

In this episode, we are joined by Steven, the CTO of PubNub, a company that has developed an edge net messaging network with over a billion connected devices. Steven explains that while message buses like Kafka or RabbitMQ are suitable for smaller scales, PubNub focuses on the challenges of connecting mobile devices and laptops at a web scale. They aim to provide instant signal delivery at a massive scale, prioritizing low latency for a seamless user experience. To achieve this, PubNub has architected their system to be globally distributed, running on AWS with Kubernetes clusters spread across all of Amazon's zones. They utilize GeoDNS to ensure users connect to the closest region for the lowest latency possible. Steven goes on to discuss the challenges they faced in building their system, particularly in terms of memory management and cleanup. They had to deal with issues such as segmentation faults and memory leaks, which caused runtime problems, outages, and potential data loss. PubNub had to invest in additional memory to compensate for these leaks and spend time finding and fixing the problems. While C was efficient, it came with significant engineering costs. As a solution, PubNub started adopting Rust, which helped alleviate some of these challenges. When they replaced a service with Rust, they observed a 5x improvement in memory and performance. Steven also talks about choosing programming languages for their platform and the difficulties in finding and retaining C experts. They didn't consider Java due to its perceived academic nature, and Go didn't make the list of options at the time. However, they now have services in production written in Go, though rewriting part of their PubSub bus in Go performed poorly compared to their existing C system. Despite this, they are favoring Rust as their language of choice for new services, citing its popularity and impressive results. The conversation delves into performance considerations with Python and the use of PyPy as a just-in-time compiler for optimization. While PyPy improved performance, it also required a lot of memory, which could be expensive. On the other hand, Rust provided a significant boost in both memory and performance, making it a favorable choice for PubNub. They also discuss provisioning, taking into account budget and aiming to be as close to what they need as possible. Kubernetes and auto scaling with HPAs (Horizontal Pod Autoscaling) are used to dynamically adjust resources based on usage. Integrating new services into PubNub's infrastructure involves both API-based communication and event-driven approaches. They use frameworks like Axiom for API-based communication and leverage Kafka with Protobuf for event sourcing. JSON is also utilized in some cases. Steven explains that they chose Protobuf for high-traffic topics and where stability is crucial. While the primary API for customers is JSON-based, PubNub recognizes the superior performance of Protobuf and utilizes it for certain cases, especially for shrinking down large character strings like booleans. They also discuss the advantages of compression enabled with Protobuf. The team reflects on the philosophy behind exploring Rust's potential for profit and its use in infrastructure and devices like IoT. Rust's optimization for smaller binaries is highlighted, and PubNub sees it as their top choice for reliability and performance. They mention developing a Rust SDK for customers using IoT devices. The open-source nature of Rust and its ability to integrate into projects and develop open standards are also praised. While acknowledging downsides like potential instabilities and longer compilation time, they remain impressed with Rust's capabilities. The conversation covers stability and safety in Rust, with the speaker expressing confidence in the compiler's ability to handle alpha software and packages. Relying on native primitives for concurrency in Rust adds to the speaker's confidence in the compiler's safety. The Rust ecosystem is seen as providing adequate coverage, although packages like libRDKafka, which are pre-1.0, can be challenging to set up or deploy. The speaker emphasizes simplicity in code and avoiding excessive abstractions, although they acknowledge the benefits of features like generics and traits in Rust. They suggest resources like a book by David McCloyd that focuses on learning Rust without overwhelming complexity. Expanding on knowledge sharing within the team, Stephen discusses how Rust advocates within the team have encouraged its use and the possibilities it holds for AI infrastructure platforms. They believe Rust could improve performance and reduce latency, particularly for CPU tasks in AI. They mention the adoption of Rust in the data science field, such as its use in the Parquet data format. The importance of tooling improvements, setting strict standards, and eliminating unsafe code is highlighted. The speaker expresses the desire for a linter that enforces a simplified version of Rust to enhance code readability, maintainability, and testability. They discuss the balance between functional and object-oriented programming in Rust, suggesting object-oriented programming for larger-scale code structure and functional paradigms within functions. Onboarding Rust engineers is also addressed, considering whether to prioritize candidates with prior Rust experience or train individuals skilled in another language on the job. Recognizing the shortage of Rust engineers, Stephen encourages those interested in Rust to pursue a career at PubNub, pointing to resources like their website and LinkedIn page for tutorials and videos. They emphasize the importance of latency in their edge messaging technology and invite users to try it out.

Transcript

Stephen
00:00:23
Today we got steven with us steven can you introduce yourself and what do you do hi there i'm steven cto at pub nub and we work on building an edge net messaging network that we've invested a lot into and we actually have a lot of you know what it should have brought into us the number of patents that we have that run against this because it's really easy to connect your servers to a message bus. You might be familiar with like Kafka or RabbitMQ or JMS. These are fantastic message buses. But once you hit the scale of the web where there's mobile devices and laptops that need to connect in, it's a different story. And that's where we have some of PubNub's scale. We have over a billion devices connected to our network, which is not a small feat to be able to do that. And if you've ever pressed and dialed a call, you ever noticed that it might take a little while for the other phone to ring, that's a signal. It's actually a signal that's being delivered to the device to make that phone ring. And what we've done is we are able to provide a signal that happens the moment you click the button. We can make that signal deliver, and we can do that at the scale of billions. And this is something that's been very challenging because all of a sudden, thousands of devices, they might need to send and receive a lot of messages all at at once. So we have this non-homogeneous workload. And to be able to scale something like this has been a high challenge. And Rust is part of the picture. So I'm excited to talk about Rust in that.
Matthias
00:01:56
Amazing. That's why I wanted to talk to you because in your videos, you have this passion for what you do. And I guess you can transport it as well. And at the same time, PubNum seems to be a company with a lot of scale and something that you don't usually find. And it's a pleasure to talk to you about maybe scalability things and also how latency is important to you because it feels like latency is also a big part of that equation is that correct.
Stephen
00:02:27
Yeah, it's huge, especially when we're, you know, we're in a prospect conversation. We've got customers that are evaluating us and speed tends to play a nice little bonus on top of it. Now, you might not always think that, hey, some use cases don't necessarily need to be at light speed, though it is that much better when you have the experience to say, oh, I clicked the button. Like, I didn't even see it. It already happened. And so you have to really pay, it's a great experience. And so performance is a huge aspect that we're, so that's how we architected. We had to go a step further. So PubNub is not just, you know, like a server sitting in somewhere and devices connect into, we're globally distributed. We run an AWS. We have Kubernetes clusters spread across all of Amazon's zones and using GeoDNS, you you connect to the closest region. And what that allows you to do is have a really good latency experience running on Amazon's network. You can have users that are nearby have the best millisecond latency, sub-millisecond latency. It's amazing the kind of performance that you can get. And then when users span across continents, you're running on Amazon's network then at that point. So you still have a fantastic latency even across the world. What sort of background do you need to do all of that maybe we haven't really properly introduced yourself and your background so maybe we can tie this in somehow yeah it's a good question it you know here here's the deal like all the efforts that you jump into over the years and have these these experiences i think there's some level of value to that i think really the biggest biggest value that anyone can have is what's accomplishable today. Like what are you able to achieve today? And really it's all about putting your effort into it. So whether you have no experience or a little experience or a lot, it really comes down to what is going to happen today because you could have this huge amount of experience and then just sort of sit back and not do anything. You could just relax. And then at that point, what's the point of the the value. You're bringing not much to the table and you need to be able to really dive in and bring value in. It really doesn't matter about that level of experience. And so being able to build something like PubNub required that level of diving. You really have to stand in, read the Linux documentation, which kernel APIs do I need to worry about? How does the buffering work over over here in the network layer. You need to just stack all that up and then write it down and then build code that runs on Linux, what we have, EPOL. We're running an EPOL. It's the primary speed performance boost that we have.
Matthias
00:05:20
And so- Can you explain what an EPOL is and why it's important for PubNup?
Stephen
00:05:25
So for PubNub, when you have millions of connections coming in, one server can't do it. However, the more connections you can have on a server, the better your margins and the higher scale you can offer and offer a better price to your customers. So in order to track those connections, you don't want a process that's actively looking at each of those connections on a box because that will burst your CPU. You'll immediately run out of CPU. for you. What you need is an interrupt that the kernel provides through APIs, through different polling interfaces, which is exactly how all the devices on your motherboard work when it's communicating to your devices. It's all through interrupts in the kernel. So you press a button like the different keyboard presses on your keyboard. Those are interrupts that come through, but there's nothing that's actively checking. Did you press the H key? Did you press the J key? it only goes through interrupts and that allows you to have an efficient communication through the hardware on the system and so this is what epol is it is an interface that allows us to receive those interrupts through the os bind them onto network events and then be able to act on them or emit data over over the bus and get all sorts of scale it's it's night and day if we didn't have this it would be a lot more expensive to run pub nub from from what you tell me it, sounds like you've always been interested in performance interested in these technical problems and and how to solve them if did us if that is correct when did you first get in touch with rust and how did it feel like and also what did you do before rust so today pub nubs written written in C, which is been, you know, there's there are challenges with that. We've got all sorts of stability over the years that we've nailed down, and we've got fantastic performance. Rust came into the picture when one of our senior engineers was like, hey, check this thing out over here. There's this thing called Rust. And this was maybe, I don't know, five years plus, a long time ago. And we were all like, yeah, yeah, that's Rust, sure. What is this thing? No one really knew. But it turns out the capabilities of Rust with the memory safety and the performance that we get near C speeds is really attractive to us. That level of capability is huge, especially with the compiler being able to detect some common problems. That level of attractiveness at our scale brings in the reason why we brought Rust in, basically.
Matthias
00:08:14
Okay and before we talk about rust i'm curious about the c code as well because when you run such a platform you're bound to run into runtime problems as well and you probably had your fair share of maybe outages or maybe runtime issues maybe can you tell us a few stories that you you encountered through all these years with your Rust and with your C experience?
Stephen
00:08:44
Yeah, absolutely. So you run into these things called segmentation faults all the time. And you get these things that are called core dump files, which is all the memory that the current application at the time had. And then that saves that to dist so you can go in and debug it. You don't want to be in that scenario because that means that you've potentially damaged or data can be lost. There's problems in that scenario.
Matthias
00:09:13
And how does it work then? Did you have an on-call rotation? Did you have a lot of outages or a lot of incidents? And how has that changed since then?
Stephen
00:09:25
Yeah, a lot of problems over the years. Yeah, absolutely. Absolutely. Hitting scale, running into memory pressure, hitting. So with PubNub, you can multiplex data on a single connection, which means that you need to subscribe to certain topics or channels. We call them channels in PubNub. You can also apply filters to those channels and say, I only want to receive messages that have this tag or that are beyond this threshold based on this floating point value. So you can make those subscriptions. The problem comes in, how do you manage the memory when you're streaming this data over through a network across systems in a distributed fashion and then multiplex that data in but then still you have to deal with memory management and cleanup afterwards and so it's really easy to free some data or free a pointer or something but you really needed to keep that alive for a little while and did you have the The opposite problem, too, where you were leaking data over a longer period of time. Oh, yeah, that's right. We had that problem, too. We absolutely had that problem. We were using memory allocators that were performing like JMALEC. Even in that case, we were in situations where memory just kept growing and... It's it's a tough situation because you run things like valgrind and you like dive in and you want to find where those leaks are because you know it's causing slow it's it's expensive you don't want that because it's memory is not cheap and so in order to stay up and operating you have to buy more memory to outpace the growth that leak and then that buys you time to find the problem.
Matthias
00:11:12
So on one side, you have a very performant language like C, which saves you hardware costs, I guess, because it's really efficient. And on the other side, you have engineering costs because of these issues. And I wonder how you balance it out, if it evened out at the end, or if you noticed that the engineering costs were significantly higher than, say, the hardware savings. And what's your take on this?
Stephen
00:11:40
If you run such a platform at scale yeah right yeah it's a good question so, it's it's i because we've been doing this for 14 years we've we have time and experience under our belt and so we've been able to create a stable system over the years that is the most powerful most popular most heavily used api in our current suite of apis how did you hire for such positions did you go to hardcore network admins did you go to fang companies that had staff engineers with that experience how did you find such people even yeah it's all the recruiting patterns that we possibly could take advantage of because finding c experts today even back uh even back 10 years ago was a challenge because even if you were a C expert. You don't really might want not want write C anymore because you know all the challenges and problems that come with it. And especially in a system like the web where you need to have something that's active and always alive resilient and it's you know we can rely on it, that's a scary proposition. It's like okay so we're gonna have to write some C that's super stable and any bit of latency is going to be noticed. So it's like the worst possible situation. Did you ever have that situation where someone came into a job interview applying for a role as a C developer and actively saying by themselves that they don't want to use the language anymore for such problems and they wanted to try something else or was it more like they learned on the job that C at scale was hard. They probably knew but it before so. Yeah they knew ahead of time yeah. They had the experience because it's really I think even if you're getting started as a C developer you'll immediately run into the segmentation fault or something else like it is a it's a rite of passage something that will happen it's like guaranteed.
Matthias
00:13:42
Yeah the question is not if but when that's correct. The other thing I wondered about was kafka is written in Java and then there's this new competitor I guess it's called NATS which is written in Go. And did you ever consider to use any of these languages?
Stephen
00:14:05
Java wasn't on the list, mostly because that was mostly an academic language, at least for me at the time. Right. And that was, it seemed not like the right language if I wanted to get the best possible speed and performance for exactly what I was looking for. So I wanted something that was, you know, robust and powerful.
Matthias
00:14:29
And Go? What about Go?
Stephen
00:14:31
You know, Go wasn't, I didn't know about Go back then. So that didn't even make the list because I didn't know what it was.
Matthias
00:14:40
I guess by now you know Go and maybe you have a couple of thoughts about it, especially comparing it with Rust, for example. What's your take on this?
Stephen
00:14:51
I like the compiling speed of Go.
Matthias
00:14:57
It's true.
Stephen
00:14:58
Yeah, it's boom. It's almost immediate compared to like Rust, for example. Example so it but otherwise we still we so we have a service in production with go we actually have a couple services in production with go but you still run into runtime errors with go like that was something that's like oh it's a compiled language and it's more modern you know it's a little bit of pythonic and you know the the syntax is a little more streamlined but, there's it's still easy to get verbose with the language it's easy to over abstract a nd it still runs into runtime errors. The performance is okay in production, but it has these GC pauses that we noticed. We did attempt to rewrite part of our PubSub bus in Go, and it couldn't even come close, not even close in performance.
Matthias
00:15:45
Did you have a one-to-one comparison? Did you run a benchmark? And how did that look like?
Stephen
00:15:52
Yeah, latency was immediately 10x slower just out the gate, even at low scale. And latency was you know it's it kind of tails up into you know how much further we can push the thing a messages per second were like was it 5x less right, it's yeah and then there were the gc pauses so just suddenly latency would spike periodically. And that was not great either right so we've got this thing that's over here we want to read and we actually the question you would ask is like why would we choose to attempt to rewrite something that we have that's robust and see and it's stable rewrite it and go and the answer is maintainability because we're scared to touch it it's working right now and if we make any adjustments you know that's the danger zone and how did you go about like even or how would you go about rewriting parts of that in rust if that is even an option for you because certainly for companies it is just a cost to rewrite anything and it's also a risk you need training you need adoption you need trust in what you build and what you deploy does it even make sense for you right now and if so how would you address such a topic yeah it's it's as we go so right now rust is the most popular language at PubNub by far all of our new services are typically elected to be written in rust everything going forward will be rust and this is due to our scale and you know we've seen fantastic results from it we actually i actually have some real world numbers as we have replaced some of our services with Rust and it has been fantastic.
Matthias
00:17:42
Can you share some of these numbers? I would love to hear them.
Stephen
00:17:45
Yeah. Yeah. Okay. So some of our services were written in Python. Has some performance considerations in general with Python. We have gotten past some of those was using something called pi pi which is you know what is it more about memory route routing and pathing but at runtime and so like a just-in-time compiler for python exactly yeah and there's a trade-off though because it takes up a lot more memory okay like a lot like it can go gigabytes per process but you know the cpu is you know five times faster which is in cpus are are expensive right you pay around four pennies per hour for cpu on amazon for cpu and then memory is what it's like half that price right based on so it's like a good trade-off in terms of cost, when when we replace that service with rust we saw another 5x boost not only in memory but in performance as well on top of it that means you use one-fifth of the memory in rust compared to of PyPy? Well, actually, way, way less. It's more like 30 megabytes or something like that in Rust versus a gigabyte. Because PyPy, it likes the memory. It keeps, it's a memory fan.
Matthias
00:19:09
Would you say over-provisioning is fine for some of your problems? Where, let's say, you have a big event and you know exactly that this is upcoming and then you need to prepare the infrastructure for it. Do you usually over-provision for such a case? And in this case, you scale it up to whatever memory it needs, times two, for example. Or would you say, no, this is exactly what ruins our costs. And I guess we want to be as close to what we need as possible. Where do you draw the line?
Stephen
00:19:44
Oh, it's a business question. Yeah, that's a good one. Because, you know, we want to make sure that we're making profit. We want to keep pushing forward, continue innovating. And that's where our source of funds allows us to do that. And if we're spending all of our, if we're getting like, here, Amazon, take all the money, give all of it over to Amazon, then we won't have any to continue to grow the business. So yeah, that's, that is, that is a trade off, right? That's a balance. How far do you go? Something that we took heavily in investment was Kubernetes and auto scaling with HPAs. So these days now, PubNub, we don't have to do any extra beforehand readiness.
Matthias
00:20:28
What is HPA?
Stephen
00:20:30
Horizontal Pod Autoscaling.
Matthias
00:20:31
And how does it help you in this case?
Stephen
00:20:34
It auto scales for us okay based on memory usage or cpu usage or do we have any yeah it could be both and custom metrics too so based on network load based on connectivity number connections based on the cpu usage we can target our scale tart our desired amount of resources that we purchase and this is really cool and now the way we spend like you're talking about our budgeting how much do we pre-purchase we don't need to do that ahead of time say if there's a big event what we do is we just think okay how much it's a throttle it's a yaml config file for us we use helm for helm charts we just say okay let's bring it up to 50 or bring it down to 20 like it depends on this on which service we're we're optimizing and then that allows the system like okay when When it hits this threshold across the pool of pods as an average, I will add more. Or if it dips below, then I will subtract some. So that way it's always at this particular level. And so now we don't have to pre-scale. We just tune with levers.
Matthias
00:21:43
So you got predictable performance, especially for the new services, which going forward will be written in Rust. Rust. And coming back to the original question, how would you integrate a new service into your current infrastructure? Say you have a Rust service and then you have all of the previous services that you have. Do you connect with them on the network layer? Do you integrate them with FFI, like the foreign function interface and start to... Yeah compile things and maybe use language bindings or or what's your take on this do you do it through maybe your own channel and you send messages and you do some sort of event sourcing like is Yeah that's it a hybrid of these models how does it look like in the back end. that's that's a strong question depends on the the situation right so there's two kinds of typical models that that you see in the industry, and we leverage both. One model is, you know, a request and response based through an API.
Stephen
00:22:47
That's our primary, like offering. And so if we were using when we're using Rust, for example, we use axiom as a API framework. And then for our, our event driven, which is the other approach, right? So you've got these other ones, we've got web and event. for event we run off of msk through kafka with which is like amazon's competitor to confluent and we run off of that as an event source and that allows us to process that data, asynchronously out of band from the system right so you've got these inline api calls and then you've got these asynchronous background workers that work in the event sourcing and how do the the workers communicate internally?
Matthias
00:23:33
Do you use JSON or Protobuf or Cap'n Proto or something else?
Stephen
00:23:39
Yeah, it's Protobuf. For some others, it's JSON, right? So it's, but it's, yeah.
Matthias
00:23:47
I would assume that you use Protobuf for some of the more, let's say, high traffic topics or maybe the ones that need the most stability. That's also an important factor, right? Back at my previous job, we used JSON and then moved to Protobuf. And what we found was that the number of errors would go down because you would handle all of the edge cases more or less at compile time because you knew the binary format.
Stephen
00:24:13
It and there were way less surprises at runtime did you see the same what's your take on protobuf, yeah that's that's a good question usually the project that we had we didn't migrate like you did so i don't have a comparison between the two um i can we can see both sides and they're both doing what they do protobuf is more more performant with us pubnub our primary api is json based for for our customers, as you're connecting to devices and sending data between them. Right. So even if you're connected up through an IOT device, like August lock is one of our customers, you can send a message from your phone to trigger your door to open to unlock. That is, yeah, that that's JSON, right? You can choose to also make it a string or basics 64 encode it to have maybe more you know shorten that down but for us since everything's in json and it's a lot of its text of most of its text protobuf doesn't change much of that because all text based but things like maybe a boolean where it would be the best right because you've got these strings like true or flase right those are big character strings that can be, Protobuf can really shrink those down.
Matthias
00:25:31
Yeah, yeah. But at the same time, you probably have compression enabled anyway, so you use Gzip. And then since in big payloads you have some repetition, you probably benefit from this as well. So it's not... I can see a lot of advantages on both sides. It's very true. Speaking of advantages and coming back to Rust for a little bit, I wonder if you could pick only one trade of Rust, whether it might be performance or stability slash robustness. So on one side, you have raw performance. On the other side, you have operational robustness. What would you pick? It could only be one of these.
Stephen
00:26:17
I'm leaning towards performance. I think performance I mean yeah because obviously we chose C at the algo right and then most of the internet still runs on C today that because of the scale that we're at performance is paramount. Yeah and and then, the first endeavor you had with Rust what was it what was the first project you tried at PubNub and you obviously thought it was a success because you moved forward with it.
Matthias
00:26:49
Was it like typical CLI app or was it a service that you started or how did it work?
Stephen
00:26:57
Yeah, good question. First, Rust at PubNub was talking about gRPC. We were offering and looking to offer a gRPC endpoint that you can communicate with devices outside the network on like things on cell towers and things like that. We added that. that we built grpc.publim.com with Rust. You can connect in, yeah.
Matthias
00:27:20
And that was pretty early in this gRPC world for Rust because the ecosystem probably back in the day wasn't as mature as it is now, if you want to call it that. And gRPC, just for the people that don't know, it's, I guess, remote procedure calls, and it comes from Google. I think this is what the G stands for. Correct me if I'm wrong. But it's basically protobuf over HTTPS and HTTP2, I guess. Is that correct? Yep. And you provided that service. And why did you pick Rust for this specific task to begin with?
Stephen
00:27:59
That was an engineer I was talking about earlier. The senior engineer was like, hey, did you hear about this thing called Rust? And then two years later, after, there's like, okay, let's build the service in Rust. Let's just do it. Let's see how it goes. And we did it and it worked.
Matthias
00:28:14
Were there any setbacks or was there any pushback from the team to maybe move away from an existed tried and tested technology and moving towards rust were they skeptical was there any resistance so to say there is a surprising amount of excitement i would say yeah and and just eagerness to move forward with the next generation technology which super endearing like and did you set any goals or expectations before the project for example do you say oh yeah all the tests need to pass or it needs to be integrated with our cicd pipeline or did you say well it's such a new technology anyway you cannot really set those standards because you need to try and experiment And if you set two stringing goals, then what you end up with is maybe the same solution that you had before. So let's just see what it takes and run free.
Stephen
00:29:15
Yeah, that's a lot more of my philosophy is the latter there is like, hey, let's see how far we can push this. Let's see where we can go. Let's be pioneers here as well. And, you know, it turns out Rust, I think, and I believe is that next, it is the, you know, multi-billion dollar language that if you take advantage of it, you will profit from it. Like that is, it's no doubt in my mind, we're already seeing it now. It's fantastic. fantastic would you say that is true for things outside of infrastructure or for infrastructure specifically oh okay so all right let's think about like devices in the world that might be running rust right like maybe your car or maybe some there i mean we need to consider binary sizes and other kinds of hardware limitation that comes into the picture and obviously we're in you know almost 2024 now. And we've got supercomputers of all sorts of shapes and sizes and everything's really fast and small. We're in the future, though it's still... We might as well take the best advantage that we can. And, you know, I haven't tried, how small can we get rust? Can it go pretty small? I've seen usually pretty big binaries that kind of look like they're scary, a little scary to have on like IoT devices.
Matthias
00:30:39
And would that be a use case for PopNub to deploy to IoT devices? Is it something that you do or have on your roadmap?
Stephen
00:30:46
No i'm just thinking like how far can we stretch the rust like how far can it go now obviously we can it can basically run everywhere i would say any device and it can be fine and you know maybe we can optimize its footprint over time but for pub nub and our api service being a communications company where we need to have that reliability and of speed and performance, no doubt Rust is our top choice going forward.
Matthias
00:31:16
Maybe let's turn it around and say, do some of your customers use it for IoT devices already?
Stephen
00:31:21
They do. Yeah. Actually, now that you mention it, we do have a customer and they've actually been pushing us for years for Rust SDK because they have their devices in IoT device that's deployed with Rust. You know, I didn't even think of that. and they chose rust because it's got that stability reliability and yeah we we did we delivered them a rust sdk so it can communicate because they had to do it by hand before and now with the rust sdk it's even cooler with rust and you can integrate that into existing c applications right so you're mentioning ffi and things like that so this is a neat strategy that that we're looking into to make sure that our existing customers that are C-based, which we have several that are still in C, they could actually just rely more heavily on our Rust SDK that we spend more time on. Because I like that. Now we can start putting... Where else can we put Rust? How far can it go? Let's bring it everywhere.
Matthias
00:32:30
True. It seems like since this is open source, And yeah, I take this way because I read the blog post about it, but it is open source. And the beauty of it is that people could look at it and integrate it into their projects. And it could be some sort of standard or protocol that you use to communicate with them. And I guess developing such standards in the open is a huge advantage as well for you from a company perspective. And we talked a lot about the upsides of Rust. Of course, there are still downsides. There are things that are just trade-offs we need to be aware of. A couple of things that I can think of, for example, might be if you're too much on the bleeding edge, then you might at least risk instabilities or you might risk some churn from time to time. Because not every introduction of a new technology runs very smoothly. But I wonder what what your take is on this is that really a huge problem in Rust's case did you have other transitions which were easier or harder and also would you even say that there are, downsides or what would be the downsides of Rust that's a lot of questions yeah yeah i'm really thinking about this you know i was reading your reading your " Why Rust" article and the part that that stood out to me was how NPM migrated to Rust yeah what I mean their whole thing is JavaScript like why that was so great is that as they really did that that's been they did that in 2019 and one thing that i was extremely surprised by was they do 1.3.
Stephen
00:34:22
Billion javascript package downloads per day which is incredible yeah that's amazing and okay rust makes yeah rust the win for sure yeah that's another win for rust and what are the losses what are the downsides yeah are there any downsides i mean it the compiler does take a while but you know when you're doing iterative compilations when you're locally developing it's fine it's like a couple seconds here or there so that that's that no worries on that. From a stability perspective, the compiler itself, even if you're bringing in like alpha software or alpha packages, like with a 0.0.1, like version number, it's the compiler is still doing a good job there. Right and so it's very likely that as long as you're just you know leaning on the rust native primitives for concurrency then these i think that it's it's wild west at that point you don't really have to worry about it because you're going to have that compiler safety so i i think that's also okay right i mean tell me i'm wrong like i think that's i fully agree 100 what about the the ecosystem did you miss any packages anything that's like could be better from an ecosystem perspective you mentioned pre 1.0 packages is that a huge problem for you or would you say no actually you're pretty much covered getting rd kafka lib rd kafka i would say that one's usually a pain to get to work every time a developer tries to set up a dev environment or get it to deploy into production or integration, they have trouble. I do have trouble too. It's like, oh, great, libRDK. Okay. So there are rough edges into getting things to work. But as soon as you get all the right settings tweaked and dependencies ready, it does exactly what it needs to do.
Matthias
00:36:23
I can see why libRDK might be a problem. I like it, but as far as I can remember, Remember, it has bindings to other C libraries like OpenSSL, and this makes it extremely hard sometimes to pin the correct version in LibRD Kafka, which you will use from the Rust side at some point. And then you need to kind of transitively talk to the C compiler to do the right thing and to link against the correct libraries. Yeah. Yeah, it can be a bit of a pain, I guess. Yeah.
Stephen
00:36:58
So there's those challenges. Oh, okay, I got another good one for you. I'm a proponent of simplicity and keeping things as simple as possible. And even, you know, as we're writing code, abstractions typically, though they sound nice and fancy, they are, they add this debt over time. And I would say it's better to have repeated code and not have super abstractions. But rust comes with all sorts of fancy bells and whistles doesn't it what are these generics and traits and all these fun things and it's really easy to be attracted to those because they look appealing and they do bring some benefits they add a huge amount of overhead for bringing new people into the team and as we're training on the job that level of complexity just adds a huge barrier to entry and rust is massive for that that's probably the biggest downside can i have like an easy rust i want like a simple rust that's what i want well there is one guy david mccloyd i'm not sure if i pronounced the name correctly but he recently wrote a book about simple rust And I guess the idea was to go and maybe learn Rust up to a point where it's usable for solving your problem, but not go too far.
Matthias
00:38:25
And certainly, I agree that there is an issue with too many abstractions. And it feels like Rust attracts people that like abstraction. And then at least their second Rust program is barely readable. And the ad trades and so many abstractions until they come back to the essentials and work on the core of the idea again.
Stephen
00:38:50
Yeah, that's much better. Just keeping things simple, it's maintainable. Anyone can come in and read it. There's not this giant barrier to entry.
Matthias
00:39:00
Did you notice that, from your perspective, is asyncrust another such dialect which makes the language harder to understand? Did you get in touch with asyncrust and what's your opinion on that?
Stephen
00:39:16
You know, I would say async helps helps because having to deal with futures was always very annoying. And I remember early days, even now, when I was running the compiler and anything to do with the future, the compiler, it always tries to be helpful. But if it had a future involved, it would just print out pages of unintelligible, like there's no way to debug it, like even close. Have they fixed that? I don't know. Is that better with futures now? Because I haven't run into it myself personally recently.
Matthias
00:39:49
Yeah it is gone it has gotten much better of course you don't have to write futures yourself anymore i'm not sure if you ever did that but it's not fun now the language helps you a lot more and the error messages are still not ideal but they get better yeah i love it they're like hey there's this thing do this right here and that'll fix your problem i'm like okay i'll go and type type it in and do it.
Stephen
00:40:14
And it's like, error. Nope, you did it wrong. You did it totally wrong. And it's like the compiler is yelling at me, but I just did what you told me to do. Why isn't it working? And so that still happens today with myself. I still run into that problem.
Matthias
00:40:31
How do you teach people on Rust? How do you train them? And how do you spread knowledge across your company?
Stephen
00:40:42
Good question. So it's all on the job training. We also have community practices where we jump in and say, Hey, learn these cool things about rust. And we try, we try to keep that going. We also have a dedicated Slack channel where we post all sorts of things. We've been posted links to one of your articles. One of our engineers jumped in there. It's like, Hey, look, there's this article over here. Check it out. And it ended up being Matthias's article. Like, Whoa, check it out.
Matthias
00:41:07
Thank you for that.
Stephen
00:41:08
It yeah so it's it's a lot of knowledge sharing and that sort of thing was it always the case, did this first engineer that wrote the first rust service also share their experiences with the team and was it something that kind of came naturally and they talked about it or was it enforced by the organization to keep spreading knowledge it was encouraged mostly informal so it started informal and organically and then more heavily encouraged you know mostly i'm a big champion so i wanted to push that as much as possible and encourage rust where at all possible because it's there's there's profit there there's gold the rest is like this nice uh from a business perspective perspective you would only choose rust for everything basically all scenarios including ai, but it's all python these days right yeah yeah and do you see future where ai and rust come together and and maybe the infrastructure or the platform that ai runs on is written in rust. So today it's written mostly in C, right? So even if you're running Python, you're probably using LibTorch, which is Facebook's framework that is written in C and it communicates the GPU. And most of the latency is in the round trip between the system bus and the memory that you have for those matrices, having to pipe that over to the GPU to run some multiplications. And so that latency is the main bottleneck these days. And through various pipelining techniques and texture memory buffers, you can provide some major improvements there. Even with Python, which is the main language these days, you're still mostly using C libraries at that point, right? And so my thought is, how much faster can we make it if we do this in Rust? Can we bring it even further? And there are some wins there, because as you're doing data processing, data cleaning, and all sorts of other things that need to go through the pipeline, those are non-GPU tasks for the part. And so if we start getting more rust into the ecosystem, do data cleaning, data processing, embeddings, right? Those things are non-GPU tasks, they're CPU tasks because we need to convert various organic data, which is going to be like text or images or pixels. Pixels, we need to convert that over into embedding format, which is, you know, dense matricy that we can then pipe over to a GPU.
Matthias
00:43:55
Yeah.
Stephen
00:43:55
This is where Rust can start shining beyond the existing pipelines with Python.
Matthias
00:44:01
Because it would reduce the memory that is required for these transformations or for some other reason?
Stephen
00:44:07
Memory performance, more organic performance there. And you know we do see was it numpy and some other python libraries are also c-based so you're still getting some c-based performance there but there's still the overhead of the communication between you know the python vm and the c library dependencies so i think there's just a better chance to go more native rust in terms of performance especially as we're starting to get bigger right so nvidia just announced their h2s which are almost double the amount of giga It's like 141 gigabytes for GPU. It's like a huge amount of memory. That just means a lot of data. And the more we can save in performance, Rust, I think, can take us that much closer.
Matthias
00:44:53
Time to order one. I know very little about machine learning, to be honest. But one thing I heard about is that in the data science area, there are a lot of improvements being made by people that use Rust for the first time for some of the lower level libraries. I might be wrong here, but I guess Parquet is written in Rust now or parts of it. And Parquet is this data format, which is, I guess, column and row based. And it's kind of efficient if you want to do very typical data science transformations. You see, my knowledge is pretty limited here, but it feels like Rust is getting into these areas as well.
Stephen
00:45:36
Well that's a good way to get rust into things is make it as a library that you can import into other languages so it becomes a foundational technology that you cannot escape from anymore i like that it's like you're getting rust whether you like it or not, it's all rust in the end will you say there's still room for improvement for tooling in general I'm talking about debuggers or profilers and things that might help you in your day-to-day. I think the biggest one would be, so you can set up a linter, you can set up your linting configuration to have certain standards and be more strict about certain things, right? Like guarantee no ability to use the word unsafe anywhere ever, right? Yes. That's a really good one. Make sure we have that.
Matthias
00:46:31
That but also i want a linter that more strictly enforces like maybe a lego version of rust a simple like a python version where it's just really super duper simple and i think you know that's something that might already be available to us today i just hadn't really jumped in to investigate it much but i think something like that where we just have like easy simple rust and the linter will check that to make sure hey you're using something that's a little too complex go this direction instead using something more simple even if there's a little bit of repetition i think it's it's a lot more worthwhile this is a brilliant idea i never thought about that but it makes total sense the tool would simplify your code it would be like a refactoring tool which instead of making code more abstract it will make it more readable more maintainable and more testable and maybe give you advice i wonder how such a tool could look like but actually these are things that i personally believe that ai might help us with to see those patterns and say oh yeah maybe this has higher complexity actually maybe you don't even need ai for this because you could just look at complexity numbers themselves there are cyclic complexity numbers and all that stuff. And this has been research that has been going on for decades now. I guess we don't really use that knowledge, which is out there. To put it into practice and maybe simplify our code. I can totally see where you're coming from. And is that something that you would... Where would you even look for such a tool? Where would you even go and maybe ask for such a tool? And how do you find out about these new developments in general around Rust or just about new technologies? What are your resources?
Stephen
00:48:26
Yeah, it's tough, right? So we go to the R Rust for Reddit and there's other places we can go to jump around, maybe even ask on various channels. Probably Reddit would be a way to go. and you know adding these things to cicd as a gating requirement is is that i think it's a good way to get that going so we do have linting and like it must be rust has to have passed the formatter check so you can have because you can do a lot of interesting fun things with rust and put things all over the place but you know we want rust fmt as a you know a basic readability so that's that's in there but if we could add one more thing to simplify rust and not have these advanced language features that are really fancy that you know at the end of the day it's an over optimization because how often do we really go back to that code especially since it's rust we don't need to go back and modify it that often at at the rate that it justifies us us creating such a complex, fun, fancy feature edition.
Matthias
00:49:40
It's a really cool idea, for sure. Probably it's something between Go and Rust and arrowing more towards the Rust end with ownership and borrowing and no runtime, but very simple abstractions. I like it.
Stephen
00:49:55
Yeah.
Matthias
00:49:56
I wonder what you could throw out from Rust Core to even keep the Rust spirit, if you know what I mean. Maybe there are a couple of things that might not be needed. And I also wondered about your approach towards using the right paradigm for the job, because what you build is also essentially very purely functional in nature. If you like to build it this way, right, you could model something in a very functional pipeline where you take input, you transform it, and then you send it somewhere else or to multiple places without any side effects.
Stephen
00:50:33
Objects but at the same time you have object-oriented programming in rust as well which some people use for the same problem and it's also totally fine to do so but i wonder where you find the balance where do you draw the line here yeah that's that's a really good question and i, think we just have to you know rust comes with you can you can do modules and keep things you you know, in files and then access those functions from those files using, you know, Rust scope. I think that could be the line to draw from there. And then as we're importing or leveraging those functions, we don't need to go any much further in terms of abstraction beyond that.
Matthias
00:51:16
That means you use object-oriented programming on a larger scale to structure the code. and inside of functions, you could use functional paradigms like map and filter and collect, for example. Okay, I see. That makes a lot of sense. This is also how I think about it. Well, I guess we're getting closer towards the end. And I have two questions, one that I kind of forgot to ask and one that is a sort of tradition around here. The first question I forgot to ask is about onboarding. Let's say you need to find a Rust engineer or someone that maintains a larger part of your stack that is written in Rust. Would you say you're looking for someone that already has Rust experience and maybe is proficient with the Rust paradigms such that they can go in and hit the ground running and know the principles and maybe also avoid what we talked about before where they maybe add too many abstractions and they have some production experience or would you say you would rather hire someone that comes from a different language and is smart and can get things done and you train them on the job?
Stephen
00:52:34
Yeah, it's both. Both is the answer. The pool of Rust engineers is also one of the downsides with Rust today. Having that Rust expertise is not as commonplace as say like JavaScript, for example, right? The most popular language. So we need to expand that pool by allowing folks who are interested in writing in Rust as their day job, who might be excited to learn about it, even on the job.
Matthias
00:53:04
And if they wanted to pursue a career at PubNub, or they wanted to learn more about PubNub, could you give the listeners some hints, some resources that they can go and check out? Any videos, any tutorials, or maybe also free tiers that they can check out?
Stephen
00:53:22
Oh, yeah. So the first thing to do definitely is to go to PubNub.com and click sign up. So you can see firsthand that we are a communications company, an edge messaging technology that allows devices to communicate. We're in the top 10 apps in the App Store for communication, including something hot that needs extreme latency, such as games, right? Games need to have that really, when you're doing multiplayer, latency is really critical, because you're going to have, you're there to have fun. And any blips or dots in that experience, they take away from that, that fun, right? And we're here to have fun. And so that level of performance and capability and scale is something that we offer. And you can get that firsthand experience by signing up and giving it a try out.
Matthias
00:54:12
Right and as far as i remember you are also pretty active on linkedin you post regular videos and they are very entertaining you explain various concepts and i think this would be one other resource that i could recommend i personally follow that and as i said lastly it has become a tradition around here to ask this final question which is do you have a message to the broader rust community something that you want to share an opinion maybe a message for the future of rust you already touched on many many different points that you could tie into but i just want to give you the microphone and you can say whatever you want to the rust community now yeah so this is this is an easy one so we chatted a little bit about this earlier let's have the lego version of rust please I want a really simple version of Rust and just have it baked in.
Stephen
00:55:10
For example, maybe some sort of switch. And it should be a standard or it needs to be sort of implied that, hey, this is the accepted approach for simple Rust. If we had something like that and the community was on board, I think it would be a lot easier to get the adoption of Rust. Why is Python, why is JavaScript so popular? Because the entry point is so much easier. With Rust, it is a major challenge, and it is very difficult to really dive in, especially having to deal with some of the more heavily abstracted scenarios. If we could have simple mode, like easy mode, that will just go even further in terms of the adoptions within organizations. I think that will be a huge bonus, but also make me a lot happier too, because now we have this really simple Rust that allows our developers to jump in more easily, and we we can, other teams can deal with it. So I think that will help us make Rust more portable.
Matthias
00:56:11
I love it. It's a really great idea. And if it's not a compiler feature, it could still be some sort of course or training material to maybe have a 0.5 step or like 50% of the way. And yes, it has been a pleasure to talk to you, Stephen. I wish you all the best with PopNub. i see a lot of promising alleys for your future and for using rust in production and i really like the energy as well this is something also i wanted to share it's always a pleasure to talk to you because as i said you're an entrepreneur and engineer at heart and it's a very nice combination you have going there and i guess it just shows so thanks a lot yeah it was a great Great talking with you today.
Stephen
00:57:00
Thank you so much for the time. And we'll chat again.
Matthias
00:57:04
Amazing. See you.