Rust in Production

Matthias Endler

Tembo with Adam Hendel

About production-grade infrastructure on top of Postgres with Rust

2025-06-12 49 min

Description & Show Notes

Recently I was in need of a simple job queue for a Rust project. I already had Postgres in place and wondered if I could reuse it for this purpose. I found Tembo, a simple job queue written in Rust that uses Postgres as a backend. It fit the bill perfectly.

In today's episode, I talk to Adam Hendel, the founding engineer of Tembo, about their project, PGMQ, and how it came to be. We discuss the design decisions behind job queues, interfacing from Rust to Postgres, and the engineering decisions that went into building the extension.

It was delightful to hear that you could build all of this yourself, but that you would probably just waste your time doing so and would come up with the same design decisions as Adam and the team.

About Tembo

Tembo builds developer tools that help teams build and ship software faster. Their first product, PGMQ, was created to solve the problem of job queues in a simple and efficient way, leveraging the power of Postgres. They since made a pivot to focus on AI-driven code assistance, but PGMQ can be used independently and is available as an open-source project.

About Adam Hendel

Adam Hendel is the founding engineer at Tembo, where he has been instrumental in developing PGMQ and other tools like pg_vectorize. He has since moved on to work on his own startup, but remains involved with the PGMQ project.

Links From The Episode


Official Links

Transcript

This is Rust in Production, a podcast about companies who use Rust to shape the future of infrastructure. It's Matthias Endler from corrode, and today's guest is Adam Hendel, founding engineer at Tembo. We talk about production-grade infrastructure on top of Postgres with Rust. Adam, thanks for joining us today. Can you introduce yourself and tell us about Tembo? What exactly are you building there?
Adam
00:00:27
Yeah, thanks for having me. I'm Adam Hendel i am one of the first engineers hired into Tembo and Tembo is a managed Postgres platform you know where you can click a few buttons and get a managed Postgres database pretty quickly yeah and like on the Tembo platform one of our main products is what we call stacks and that's where you can get Postgres pre-configured for certain types of workloads for you with those same few clicks. So if you're doing OLAP or search or doing something with AI and you need embeddings, in that example, we have a vector DB stack that gives you everything you need to go build your application.
Matthias
00:01:11
What is it about Rust companies and their love for Postgres? It seems like every other Rust startup I talk to is building on top of Postgres. What makes this 40-year-old database so appealing to modern developers?
Adam
00:01:24
I mean, I love Postgres because it just works. I learned to write SQL on Postgres. And earlier in my career, I was doing a lot of stuff in data science and machine learning, and I didn't know really what write looks like in terms of good SQL queries or efficient ways to build an application. And Postgres was just there so i just abused the hell out of it and it always just kept working for me no matter what i threw at it and over my career you know i i got i got better and then learned how to do things really way more efficiently and then Postgres became to shine even more so you know for me it's just super flexible always been able to just work for me don't have to hassle too much with tuning it or anything it just it just works isn't.
Matthias
00:02:20
Postgres already production ready out of the box what exactly are you building on top that goes beyond just hosting it.
Adam
00:02:27
As you do get to having very specific types of workloads, and you want to start to make some trade-offs to really get all the juice you can out of Postgres, you have to start to make some decisions. And that's where you don't have to touch the C code, but you might have to touch some of the configuration. For example, there are certain configs in Postgres like shared buffers that kind of decides how much of your system's memory can you allocate to put the working set from your database into memory. So that's something that we can set dynamically for you and then certain extensions that can really improve and change your experience with Postgres those can be quite a hassle to get installed so we pre-install those into Postgres for you and kind of go through the go through the process of you know for for specific types of workloads what are the best extensions to make Postgres the best that it can be for for that workload so it's kind of configuration and getting the extensions installed and providing a really good experience around that.
Matthias
00:03:38
So you're dealing with configuration extensions performance tuning when you started building all of this was rust the obvious choice or did you consider c c++ or python maybe as an alternative.
Adam
00:03:53
Well since Postgres is written in c that was definitely something that that came up the experience of you know building software in c is not you know it's not super modern you know so for somebody like me i i think the first programming language i learned outside of like html and whatnot was like R for like statistical programming and then Python and in Python you have stuff like the Python package index where you want to do something well hey there's like a library for that and you can just pull in that library and work on solving the specific problem that you're trying to solve. That kind of exists in C but it's not at all as quite as easy to just get up and running building stuff in C so even though, Postgres is written in C, and it's super performant and very stable. It is hard to just come up on it and start to add functionality. But in Rust, Rust does have cargo and crates. And if you are trying to build software, you can get up and running pretty quickly with Rust and using other libraries that are in crates.io and whatnot. okay.
Matthias
00:05:11
I get the c limitations like no modern tooling painful package management but you mentioned coming from python which has great abstractions and developer experience why not use python for this did you at least prototype in it before jumping to rust.
Adam
00:05:29
Well at at the time that at Tembo that we were getting started building extensions there was a framework that made it really easy to get up and running with rust that framework's called pgrx well at the time it was just called pgx they renamed to pgrx at at some point but to go from like nothing to a hello world example it was like an hour or less to create an extension that you know you write some function in rust turn that function into a sequel function that you could call that was like super easy so So it was kind of, you know, like we didn't have to, we were up and running really quickly. So we didn't really explore Python 2 seriously.
Matthias
00:06:16
Okay, that's pretty impressive. So PGRX made it super easy to get started. But I don't really understand how Postgres extensions actually work. You write a Rust function, somehow becomes a SQL function. What's happening under the hood? Is there some kind of plugin manager? How does your Rust code actually talk to Postgres?
Adam
00:06:39
Yeah, so you can write functions in pure SQL. There's procedural languages like PL/pgSQL that you can kind of still just write, you know, kind of SQL and easily create these functions. So there's an object that lives in Postgres, and it's a function that's an object in Postgres. And that function can kind of point to SQL code or one of the procedural languages, or it could point to a shared object. So now that shared object could be compiled in C or it could be compiled in Rust. So in the case of building extensions with PGRX, you have this function and it really just points to a shared object that's the code that you wrote in Rust. So when you compile a PGRX extension, you might create some functions. There's more complicated things you can do than just functions. But in this case of a function, there's some code that's like, hey, I'm creating a Postgres function and that function is pointing to your Rust shared object.
Matthias
00:07:46
But wait, if extensions are just functions pointing to shared objects, how do you handle state? Functions are supposed to be stateless, but any real application needs to manage state somehow. What happens when you need to maintain data across calls or coordinate between operations?
Adam
00:08:07
Yeah, so it does get complicated to manage state. I think the default, like when you call that function everything is starting to happen within a transaction so you can you know you can start to store state just in memory like as soon as you call that function you imagine like you're gonna read some data from a other table in Postgres and you can pull that into memory and then go do whatever it is that you want to do and ultimately you know return some result set back to who's ever calling that that function so you could store it in memory or you could store it in some intermediate tables in back in Postgres but in those cases you're always you'll be working through the Postgres spi and that's the server programming interface so you have some tools available to you to kind of deal with your state there and.
Matthias
00:09:00
The other thing that bothered me while you explained this you're going through this spi this server programming interface dealing with ffi boundaries don't you lose all the conveniences and ergonomics of rust's type system i mean how would Postgres even know that a function is really available all of rust's advantages are in its type system and safety guarantees don't you lose all of that at the ffi interface.
Adam
00:09:25
In the like memory space where your rust code is written you still have all of rust features available to you so you know as you interact with FFI, you get some, All the FFI integrations, if you're using PGRX, like PGRX abstracts that for you mostly. So you just write Rust code the same as, generally the same as you would if you're, you know, writing a web application or, you know, some script in Rust.
Matthias
00:09:56
Okay, you convinced me. PGRX seems like the way to go if you want to start a new extension. But did you ever hit any limitations which might make you move away from PGRX or is everything pretty much covered?
Adam
00:10:11
A lot of stuff is covered. There are some limitations and I feel a little bad in what I'm about to say because I wish I could have contributed back to PGRX. But some of those SPI, the server programming interface in PGRX are just not fully optimized. So if you have some SQL that you want to execute through the SPI from using PGRX... You have to, data that kind of goes to and from Postgres through that SPI, you have to serialize and deserialize all of it, even if you're not really going to touch it. So there's a project, a message queue project that we worked on called PGMQ. It was originally written in Rust using PGRX. And particularly for the batch operations where we would, you know, send or read, you know, 100 messages at a time, those would get real slow because we would have to iterate over everything, every message and just deserialize it and then resealize it as we would, you know, send it back to the user. But in those, really in those cases, like there's not a ton of advantage to, you know, just not be using just pure SQL for those cases.
Matthias
00:11:28
Does this mean the code base is now mostly SQL with a nice Rust wrapper on top? You've kind of inverted the pattern. Instead of writing a pure Rust Postgres extension, now it's SQL with a nice wrapper around it, sort of. Like, what are the boundaries between SQL and Rust in that case?
Adam
00:11:46
So PGMQ, I can just tell a little bit of background about that project. So at Tembo, we needed a queue between our control plane and our data plane. Data plane is like where we spin up people's databases, and control plan is, you know, backend web applications. So we were building this queue on Postgres and all of our tech was already in Rust. So we basically had a bunch of like SQL strings like written in Rust code. And that code would be duplicate across the control plane and then the same service in the data plane would have like the same SQL statements for like sending messages and reading messages. And that was all Rust. So we first pulled it out into a crate and then installed the crate all the places that it needed to be. So it was all client side. And, you know, since it was a crate, it was already kind of packaged up. If we could turn the crate into a Postgres extension using PGRX with, it was like, I don't know, a couple hours worth of work to take that crate and use PGRX and create an extension out of it. So then in that case, like there were all these shared objects for every function in Postgres and, you know, it was great. We had this extension and we could ship that extension around to different Postgres instances and share it with the world. And, you know, it was great. And then we started to run into those limitations where, you know, it was really how we were using PGRX made some things slow. So we actually was with the help of the community and we even got some help from folks at Supabase to rewrite it from PGRX Rust into PLPG SQL. And that was just for like the queue operations. And you know there's still you know we still want to use it use that extension at Tembo and all our stuff is in rust and you know tons of people on around the world are have their applications in rust and want to use this extension too so there's a the rust client library for it and and and that is to like give you the idiomatic rust experience for for using pgmq on on Postgres.
Matthias
00:14:11
It's very nice pattern you go from plain vanilla SQL extract it into a separate crate and make it a Postgres extension with ptrx and I really like these code usability stories I hear that a lot and I really like that about Rust and being able to kind of move things out into separate crate, so that was the transition if I understood correctly but then you run into serialization issues because when you try to get batches of messages over the wire it's probably an O of N operation to go through each and every message and deserialize it with 30 and that can be slow and eventually you want to go back a little bit and write parts of the more performant code in maybe a lower level abstraction or maybe not use pgrx for that case right yeah.
Adam
00:15:01
That's exactly it you know we talked to the the pgrx maintainers about what we were doing and they're like hey, You know, what you're doing doesn't really need to be in PGRX. You're just executing SQL statements. And yeah, there are some limitations right now with the maturity of the API that PGRX had around just executing SQL statements. So it was kind of a, yeah, we should, this should be SQL instead of, you know, in Rust.
Matthias
00:15:33
But couldn't you have kept a hybrid approach? ptrx for most things SQL for the performance critical parts what made you abandon ptrx entirely.
Adam
00:15:42
We could have but there's some additional like gotchas with extensions in Postgres, and it's mostly around the like portability of extensions so if you want if you have an extension in Postgres extensions have like the objects that we kind of talked about earlier and they also have control files and migration files and all these other things and for it to be an extension those things have to be on like the same host as Postgres but like let's say you're running on rds or you're running in supabase or google cloud somewhere if it's an extension then you have to deal with all these additional things but if it's just sql then you can just take that sql with you and manage it however you want to from.
Matthias
00:16:33
That perspective it makes total sense the pgmq client caught my attention because unlike NATS or Kafka it's just Postgres so no need for new tech in your tech stack but what impressed me the most was how rust like the api feels.
Adam
00:16:52
How did.
Matthias
00:16:53
You make a SQL-based queue feel so idiomatic to Rust developers?
Adam
00:16:59
Yeah. So, you know, my opinion, and I didn't write all the client-side code here, so we had some help from the community who, you know, built some of these things. But the things that I care about in the client are establishing a connection pool to Postgres, And you can let the client do that for you, create a pool to Postgres, a connection pool. But maybe you're doing a bunch of other stuff in your application too. So we have the ability for you to create the pool yourself and like provide the pool to the client and let PGMQ's client use your pool. And so with in the client if with Rust there's like some generics typing that we could do with that to be like you know how we execute the the Postgres functions for the the queue you know we just take in certain as long as the objects have certain traits implemented we can execute that that sequel and you know like if if it's in a different language i don't you know i don't know if like it'd be way more complex but like since we can just make these assumptions like hey pass in this thing as long as it implements these traits we're good to go you.
Matthias
00:18:14
Mentioned sqlx some people might not be familiar with it what makes sqlx special for this kind of work.
Adam
00:18:21
Yeah and in fact the pgmq client uses sqlx so yeah so sqlx i describe it as like everything that i care about from an ORM, but it's not an ORM. All the great things and none of the bad. So SQLX, you know, it does a lot of things and I don't use all of its features, but the thing I like the most is it gives me the compile time checks on the SQL that I write. So if I have some insert statement in my application and it's just written in raw SQL. I can rest assured that, you know, if I have some struct that I'm trying to, you know, serialize and insert into a row, if like the types on my struct are not in compliance with the table, SQLX will give me a compile error. Yeah, I love SQLX. It's like an ORM, but better. So what I like about ORMs is that they give me this really nice interface between my application and the types in the database. So if I have some struct, struct has attributes on it and those are typed, an ORM would be like, yeah, there's this connection between this object and the table or many tables. But I actually really like to write SQL and a lot of times I end up with SQL statements that don't really fit in ORM super well so I end up with this code base with like some SQL and some ORMs and it gets really messy but with SQLX I can just always write SQL and I love that as a developer because I enjoy writing SQL, but I still get these type checks because SQLX will look at my SQL statements and then check in the database, does the SQL statement and what you're trying to insert or what you're trying to read and deserialize into some struct. Does it is it going to work do the types match up and that that is huge it could be a little frustrating at first until you kind of get the hang of it but that piece right there is kind of what hooked me on sqlx now.
Matthias
00:20:37
I would guess the other important part is async support because all of these operations are io bound and you kind of don't want to block the client for concurrent queries, what did you end up using for async and was it straightforward to integrate into the existing code.
Adam
00:20:55
You know rewind way back to when we were like first building first started building in rust and it was like okay so for async we like we kind of have to handle this ourself and that was there's a ton of complexity around that that like frankly we like didn't want to spend our time like trying to implement and so then there's a few different runtimes out there and Tokio was like clearly the one that was the most most popular the most supported biggest community around so we just use that and you know to this day like when i'm building in rust i don't even i don't really think about which async runtime to go with i just use Tokio right.
Matthias
00:21:40
But what about very complex things like supporting transactions and by transaction i mean multiple statements that are executed atomically on Postgres if you wanted to do that in an async way i would be afraid that it would get really complex.
Adam
00:21:55
Yeah i i don't exactly know how you would if you had like two separate threads and you wanted to span some transaction across those i don't, I don't know. I haven't actually tried that. So I don't know how it would work. But kind of the way that we implemented the, I think earlier I mentioned like, yeah, this, you know, you can bring your own connection pool or really your own executor, as long as it has certain traits implemented on it, which is basically, you know, we inherit that from SQL X. But you can, you know, you can create, start a transaction, do whatever SQL you want. You could your like connection or your transaction object that you've created and if you pass that into pgmq then you could go do the pgmq things through the pgmq api and then finally commit your transaction however you however you you want to yourself you know so so like we lean on Tokio we lean on sqlx for that oh yeah the question about like to do that across threads i don't that could get tricky to like pass that connection across threads i don't know how you would do that it might be possible you.
Matthias
00:23:06
Could probably share the connections across threads with something like arc mutext and sqlx allows that kind of pattern but at the same time why would you because Postgres already handles that with transactions and that guarantees atomicity so maybe you just leave it as an implementation detail to the database.
Adam
00:23:27
Yeah from Postgres side you know that's just it's the feature transactions you know we don't have to reinvent the wheel on that part some.
Matthias
00:23:35
People might hear this and think why do i need pgmq i can just write my own abstraction on top of Postgres how hard can it be can only be a couple lines of code right and we touched on transaction support and edge cases but maybe you can allude to a few more things that you probably don't want to do yourself things that people tend to forget when they build their own abstraction functions.
Adam
00:24:00
Yeah i mean with all this stuff you know it's just code so you can you can always just go write write the code you know there's these operations that you need to do if you're going to build a queue you need to have a way to create a queue so how do you how do you define what a queue is you could you can think through that and make it what you want we have a function called create queue that, does it a way that we think is right and there's an api around it you know sending a message to a queue it's basically just an insert statement you know but that insert is handled for you, and if you're like if you're working in rust you have some message and that message becomes like an attribute like in a column on a table so you need to serialize that that message and get inserted into the table. So if you're using the Rust client, you know, there's some helpers there to do the serialization for you. So, you know, you save a lot of time by, you know, having this stuff like it's already built, it's already tested, you don't have to rewrite a bunch of things. But, you know, I'm sure there's cases out there where somebody like it makes sense for somebody to write their own.
Matthias
00:25:14
Even if you asked me to design a message queue table schema, I'd probably miss something important. I'd add a timestamp, maybe a byte array for the message, though I'm not sure if you support bytes or just UTF-8 strings. Then some kind of locking or ownership fields to prevent reading the same message twice. What actually goes into that table?
Adam
00:25:37
Yeah, I mean, you're pretty close there. So there's a concept that, you know, we're inspired from the SQS, Simple Queue service from aws the visibility timeout which is kind of that that locking feature a little bit so when you read a message from a queue you have to specify how long do you want that that message to be unavailable to yourself or if you were to try to read again or to any other consumers of of that queue you know so let's say you have like 10 threads all reading from the same queue that the first time the message is read, you could set it to say, hey, make this message invisible for five minutes. And then during that five minutes, any of those other threads, none of them will be able to get that exact same message until that message, that time, the visibility time expires, and then that message becomes visible again. So that that piece i think it's super cool because without that you'd have to have some additional worker process that would watch the queue and look for messages that need to be you know need to be flipped back available again but in this way it's like completely stateless everything is just there's nothing that can really break there you know there's no process that crashes and all of a sudden you know messages get stuck they just automatically become visible again that's.
Matthias
00:27:09
Clever it's sort of a log free algorithm because you just update the timestamp when you read it and everyone knows the message is in flight.
Adam
00:27:18
Yeah the the most complicated part of pgmq is the read statement uh it is a super complicated query it's a it's a for update statement so it it's reading but it's updating the the how many times it's been read so there's like a counter on that on that table that keeps track of how many times this message has been read there's that visibility time which you know it's some time that you know anything after that time that message can be read yeah so you know there's when you read the message you're actually it's an update statement to the table and it's a select for update which actually creates a lock on that table. And that makes it so that if two threads read at the exact same time, Postgres would figure out, hey, who gets this record first? So the for update just guarantees that only one person, one worker can get the message. But that does create a lock on the table. So we say for update, that creates a lock. And then there's a another clause in there skip locked which means any records that are locked skip over them and go to the next one and that makes it so that any other workers that are reading aren't sitting there waiting for that lock to leave they can skip it and go to the next message and so that lock is is pretty quick it's that lock only lasts for the duration of you know that that transaction which is pretty quick and like the long-term locking is handled by the visibility timeout.
Matthias
00:28:56
You probably came up with that on the first try, right?
Adam
00:29:00
No, I mean, there's a ton of resources out there. If you just search, you know, message queue, Postgres, there's a lot of people have written about for update and skipped locked. It's kind of the standard.
Matthias
00:29:13
But still, you can make a lot of mistakes when implementing that. And also, you might not know about this research in the first place. You might just naively implement it and do it the wrong way.
Adam
00:29:25
Yeah definitely like you know if you if you don't have the skipped lock on there you could be like you know you have 10 threads reading, one of them got a message and the other 10 are just sitting there that you like they would just sit there forever you know until that initial lock was released so yeah you could you could mess it up i.
Matthias
00:29:45
Really like these implementation details that's what i live for because that could really hamper your performance and in the worst case you don't realize until you're on the load and when you really need it the most it will kind of fail on you.
Adam
00:30:00
Yeah yeah that that's that's a good you know i'm glad you mentioned that because that's kind of a you know a reason why to use something that's been pre-packaged you know because you can you know there's a lot now today there's quite a few people using pgmq and you know when people run into issues like issue gets created on on the project and then you know we or somebody in the community resolves it but there's there's ways that you know try to build your own queue like there's ways it can go wrong so you know pgmq as a project has kind of started to be this like place that people go to learn and just use the code of you know when they want to do a queue on Postgres they can you know look at that project and either use the code or modify it and do it their own way that's.
Matthias
00:30:49
Why i love open source in the first place because so much is out there and you can look at the actual implementation and then decide if you really want to go down that rabbit hole and implement it yourself because just sending in a pull request will fix a problem for every instance out there that has the latest version of course which is so so great.
Adam
00:31:10
Yeah how.
Matthias
00:31:11
Hard would it be to abstract that to other databases for example if there was a customer that needed MariaDB support or MySQL.
Adam
00:31:19
Support.
Matthias
00:31:21
How hard would that be if you had to support that?
Adam
00:31:24
I guess, you know, it would be, there's some SQL files in the PGMQ project, and it would be like, hey, for every one of these statements, what's the equivalent in the other database? And if you translated that, then it, you know, theoretically should work.
Matthias
00:31:40
The cool thing is that you could use Rust's feature flex. So the project builds against the database that it depends on. I was wondering about Postgres Listen and Notify support. Where do you draw the line between using native PopSup versus needing a full message queue?
Adam
00:31:57
We've had some discussions of using those features from Postgres in PGM queue. But I don't think they are exactly, like you can just replace PGM queue with alert and notify. Mostly because you know if if you're going to send the message across one of those notification channels like if that message doesn't make it then it's kind of gone you know and you have no record of it you know in pgmq it's every message is a a row in a in a table and you can archive that row or just completely delete it if you want so you still have like a complete audit log of every message if you want it, if we're just to move it into one of those other channels and it's kind of gone, you know, unless we build something to handle it. But I do think it would be really useful to use it as a way to notify consumers that there are messages available in AQ. So right now, you know, if your application is reading messages from a queue, you have to pull the queue, you know, pull it once every second or have some back off logic, you know, pull it every second and then back off to 10 seconds or something. But it would be nice to be able to subscribe to the queue and have one of those notifications come to you and say, hey, there are messages now, now pull. And that could really help some efficiency if you have really low latency requirements and you can't wait. A one second pull interval is too long and you need to know right away. You know that uh and you don't want to just pull every you know 10 milliseconds or something less if we had something implemented on those features that could it could help a lot i think also.
Matthias
00:33:45
On top of this you could use the Tokio stream abstraction and you could just iterate over the messages and the futures would resolve as quickly as messages come in and it.
Adam
00:33:55
Would all.
Matthias
00:33:56
Kind of beautifully happen under the hood where you say well you get a future ready and it contains the value and that is your message and i guess it would be kind of nice to have that how far away are you from that reality.
Adam
00:34:11
Well i think we would first need to to get the you know implement a way to like set up those channels on any given queue and i don't think that would be super difficult, there's probably a way that this could all be implemented purely on the the rust client side too so if i think of how like lib rd kafka reads messages from a kafka topic you know it pulls but it is pulling messages in in batch to the client and then the consumer of lib rd kafka you know iterates over those those messages and we could probably do something similar in the rust client you know where you you have some pull interval but we're handling that pull asynchronously and pulling messages back in batch. And then your Rust application could iterate over those messages asynchronously if it wanted to. Yeah, I always really liked how LibRD Kafka did it because as a user there, all that stuff is abstracted from you, but it's super efficient because it is pulling for you. And unless you really get into the weeds, you don't really know that it's pulling and it's pulling messages in batch. But we could probably do the exact same thing in the Rust client.
Matthias
00:35:33
What's your biggest deployment at the moment?
Adam
00:35:35
So I think I mentioned earlier this architecture with a control plane and a data plane. We have several data planes. There's a data plane in AWS, Azure, Google Cloud, and then some self-hosted ones as well. But all those public clouds, they all read messages from a single Postgres instances, and there's multiple queues within that. And I think at peak, I think there's like 10,000 per minute, maybe five minutes or something. But really like the scalability of that it's all comes down to what can Postgres handle how many inserts how many updates per second can Postgres handle is under because that's really all that pgmq is doing is inserts and updates well inserts updates and deletes okay.
Matthias
00:36:26
So if you use that for a larger deployment it certainly won't be the bottleneck in your application unless you are at web scale and then you always have bigger problems you.
Adam
00:36:36
Have bigger fish.
Matthias
00:36:37
To fry okay.
Adam
00:36:38
Yeah what's.
Matthias
00:36:40
Your setup for the production cluster right now.
Adam
00:36:42
It's a dedicated Postgres cluster for the for the queue yeah and it's those specs on it are exactly right yeah and it it like never really gets over 20 cpu or memory memory utilization is kind of just hovers around 25 probably because that's what shared buffers are set to on that thing but i mean it is dedicated to the queue so you know it's kind of it is i think it is over provision well.
Matthias
00:37:12
That's at least better than being on the provision what are some of the other use cases of Postgres that you see in production a lot.
Adam
00:37:19
Yeah you know full text search is it is built into Postgres so if if you're trying to do that like it's pretty easy to get up and running we also have a number of people using our VectorDB stack, which that's primarily PG Vector, which is another open source project out there that lets you kind of gives you the data type of a vector and then the operations to do similarity search on top of it. It is definitely the gold standard for, working with embeddings within Postgres. And then we built kind of a wrapper extension around it using PGRX and Rust. It lets you, say you have a table with some text in it and you want to generate embeddings from every row in that table. And be like, hey, I have this table. I want to use OpenAI or I want to use Anthropic or I have a self-hosted model. And I just want to get embeddings for this table. So our wrapper extension, it's called PG Vectorize. You just call a function on a table, tell it which model you want to use, and then in the background, it's using PGM-Q to look at the table and be like, hey, which columns do we need to get embeddings for? Pull that data, call the transformer model, get the embeddings, and insert those embeddings to another table or the same table. It's configurable. But yeah, it helps you with the orchestration there.
Matthias
00:38:44
I like that you use your extensions in combination with other extensions and calling into them. It's kind of cool. How do you ensure safety between the boundary of Rust and Postgres?
Adam
00:38:56
That Vectorize extension, it runs in a background process that Postgres manages. And we purposely don't use the SPI, that interface previously, because it's kind of unnecessary. So we just use SQLX. So Postgres spins up this background process that, reads from that queue of jobs that it needs to create embeddings for. And we just use SQLX for it. So it's kind of treated as just like a normal application internally. So there is no like interacting with FFI or anything going on there. We treat it as just after Postgres starts it, then it's like normal Rust application. Since we built it that way, we have the option on our cloud platform to run that background worker in a separate container. So instead as a Postgres background worker we just take that same rust binary and run it in a separate container next to Postgres so that's an even even better way to get around that limitation of like memory management well if it's not even on the same host as Postgres then you know you can scale that thing independently with.
Matthias
00:40:05
Tembo you shipped your first real rust production application what took you the longest to wrap your head around in rust.
Adam
00:40:13
Yeah so today i love working in rust it it is a joy to build software using rust but when i was getting started there was this huge hump this hill to get over and it was super frustrating early on one of those things was just wrapping my head around like null like they're like you know there is no no you have like some and none and i looked at that and i'm like this is so weird it i came from python so you know you had a none type but there was no it was not an enum and until i like learned that it like it's an enum with like data is what you know how Rust handles null values so it's just like wrapper around your. Data, and is it null or is it not? That was, I don't know why, but it was really hard for me to grasp that. And in that same, like, direction, error handling, where it's like, okay, or error. I don't know why it just took me so long to wrap my head around. Like, yeah, you basically wrap your data, your, if you have a function, and it could error, well, you wrap it in this thing, a result, you know, and it's an enum. And it's the same for none, or like how Rust handles none. And I don't know I think I think like not today I love those things it just makes it so clean, like is there an error or not is it did it return something or was it there nothing you know it's super clean when I look at it today but when I was learning rust I'm like what is this this I don't even understand so I don't know what would have like made it easier for me to learn that early on but i remember it being super frustrating and then all of a sudden i got it like oh this is this is awesome i love it.
Matthias
00:42:10
How long did it take you until you had a good grasp on the language.
Adam
00:42:15
I'd say so when i when i started i was like learning full-time basically when i was learning Rust and i think it was two months you know maybe that's slow for some people like like all day, five days a week at least you know i was able to get up in like the building software, and still not quite understanding error handling and none you know handling of none types so but yeah i think it was two months for me of pain two months of pain i.
Matthias
00:42:50
Really like that framing I mean.
Adam
00:42:52
But the one thing, the one thing I loved like day one though, was having cargo, like, you know. I came from Python, and it was like, how do I get my environment set up to run my application? At the time, it was like, do I use virtual env? Do I use poetry? Do I just install everything with pip? You know, it's just, it's a mess. Like, there's some projects that are making it better in Python today. Like, UV is doing a really great job cleaning up that, solving that problem. But when I was learning Rust, it was like, hey, how do I run this? How do I run this application? Oh, Cargo Run. that's it you know how do i test it cargo test you know it it was just so intuitive you know how do i add a library from crates cargo ad you know and it was just all built into the tool chain and i didn't have to go search and read a bunch of blogs to figure out how to just get started, so that from day one was probably what kept me there through those two months of pain oh.
Matthias
00:43:55
Yeah day one in python is strange because on one side the language is great but on the other side the tooling is not so great and you really get those mixed feelings on that note shout out to Charlie Marsh from Astral who was a guest in episode three of season four you might want to check out that one so they do great work but the experience before that was subpar the tooling was a bit all over the place and very difficult to use.
Adam
00:44:25
Yeah i have like nightmares of so when i was working at shipped i all of a sudden they started issuing employees the macbooks with apple silicon so they all had arm architecture, And all of our, like not every project out there had Python wheels for ARM. So now like all of a sudden new employees coming in, like their local environments would just not work with our internal libraries because we weren't building wheels for ARM. Like we were using Kafka and librd or Python's Kafka library didn't have a wheel for ARM for the longest time. It was like, okay, we have to compile this stuff from source all of a sudden. And it was I had nightmares from that everything fell apart.
Matthias
00:45:16
Well at least they did a great job on the migration from Python 2 to 3 that was extremely painless of course.
Adam
00:45:22
Just.
Matthias
00:45:23
Kidding, it was a nightmare.
Adam
00:45:26
Yeah the most obvious one there is just like trying to print something, it's like the print API changed.
Matthias
00:45:33
You know yeah, thanks for bringing back that memory by the way, do you have to touch a lot of python nowadays.
Adam
00:45:41
Not not too much you know there's a lot of stuff in machine learning is still you know there's lots of libraries out there for machine learning things in python and, in the machine learning space python's mostly used as a wrapper around some c library that's super well optimized rust is i'd say catching up and like hugging face has built some rust libraries that kind of do the same thing as the python equivalents so but yeah most most of the time if i'm building a web server today i'll just grab actix and and run with it you know if i'm working with data and Postgres i'm going to grab sqlx 10 years ago it would have been like flask or fast api or maybe not 10 years ago for fast api but you know i have the equivalent of everything that I would, you know, five years ago would have gone to something in Python. I have, For me, I'd just pick Rust. It's just easier. It's a better experience.
Matthias
00:46:41
And on that very positive note, what's your final message to the Rust community?
Adam
00:46:47
Yeah. I mean, keep contributing to the project. The project doesn't live unless people are working on it and making it better. Today, there's really no such thing as finished software or a finished programming language. It's not like software is distributed in the mail where you burn stuff to a disc and mail it out and it just runs that way forever software is living breathing organisms now so things have to be constantly fed and for rust like if people stop contributing to rust and you know like the project won't go on. So I guess my biggest message would be thank you for building awesome stuff and please keep doing it.
Matthias
00:47:36
Adam, thanks a lot for the interview.
Adam
00:47:38
Thank you.
Matthias
00:47:40
Rust in Production is a podcast by corrode. It is hosted by me, Matthias Endler, and produced by Simon Brüggen. For show notes, transcripts, and to learn more about how we can help your company make the most of Rust, visit corrode.dev. Thanks for listening to Rust in Production.