Tembo with Adam Hendel
About production-grade infrastructure on top of Postgres with Rust
2025-06-12 49 min
Description & Show Notes
Recently I was in need of a simple job queue for a Rust project. I already had Postgres in place and wondered if I could reuse it for this purpose. I found Tembo, a simple job queue written in Rust that uses Postgres as a backend. It fit the bill perfectly.
In today's episode, I talk to Adam Hendel, the founding engineer of Tembo, about their project, PGMQ, and how it came to be. We discuss the design decisions behind job queues, interfacing from Rust to Postgres, and the engineering decisions that went into building the extension.
It was delightful to hear that you could build all of this yourself, but that you would probably just waste your time doing so and would come up with the same design decisions as Adam and the team.
About Tembo
Tembo builds developer tools that help teams build and ship software faster. Their first product, PGMQ, was created to solve the problem of job queues in a simple and efficient way, leveraging the power of Postgres. They since made a pivot to focus on AI-driven code assistance, but PGMQ can be used independently and is available as an open-source project.
About Adam Hendel
Adam Hendel is the founding engineer at Tembo, where he has been instrumental in developing PGMQ and other tools like pg_vectorize. He has since moved on to work on his own startup, but remains involved with the PGMQ project.
Links From The Episode
- PostgreSQL - Super flexible ~40 year old relational database that just works
- R - Statistical Programming Language
- pgrx - Extend Postgres with Rust
- Postgres Docs: PL/pgSQL - Scripting with Procedural Language in PostgreSQL
- Postgres Docs: SPI - The Postgres Server Programming Interface
- pgmq - A lightweight message queue extension, initially written in Rust
- Tembo Blog: Introducing PGMQ - a blog post about the project
- sqlx - All of the great things of an ORM, without all of the bad things of an ORM
- tokio - The de facto standard async runtime for Rust
- AWS SQS - Amazon Web Services Simple Queue Service
- Postgres Docs: LISTEN - The native Postgres sub part of of pubsub
- Postgres Docs: NOTIFY - The native Postgres pub part of of pubsub
- tokio-stream - Tokio utility for asynchronous series of values
- Postgres Docs: Full Text Search - Postgres included FTS capabilities
- pgvector - The standard extension for vector/AI workloads in Postgres
- pg_vectorize - Automatically create embeddings for use with pgvector
- Python Standard Library: None - A type, but not an enum
- Rust in Production: Astral with Charlie Marsh - Massively improving Python day 1 experience
- Hugging Face candle - Use ML models in Rust
Official Links
Transcript
This is Rust in Production, a podcast about companies who use Rust to shape
the future of infrastructure.
It's Matthias Endler from corrode, and today's guest is Adam Hendel,
founding engineer at Tembo.
We talk about production-grade infrastructure on top of Postgres with Rust.
Adam, thanks for joining us today. Can you introduce yourself and tell us about
Tembo? What exactly are you building there?
Yeah, thanks for having me. I'm Adam Hendel
i am one of the first engineers hired
into Tembo and Tembo is a
managed Postgres platform you know
where you can click a few buttons and get a managed Postgres database pretty
quickly yeah and like on the Tembo platform one of our main products is what
we call stacks and that's where you can get Postgres pre-configured for certain
types of workloads for you with those same few clicks.
So if you're doing OLAP or search or doing something with AI and you need embeddings,
in that example, we have a vector DB stack that gives you everything you need
to go build your application.
What is it about Rust companies and their love for Postgres?
It seems like every other Rust startup I talk to is building on top of Postgres.
What makes this 40-year-old database so appealing to modern developers?
I mean, I love Postgres because it just works. I learned to write SQL on Postgres.
And earlier in my career, I was doing a lot of stuff in data science and machine
learning, and I didn't know really what write looks like in terms of good SQL
queries or efficient ways to build an application.
And Postgres was just there so i just abused the
hell out of it and it always just kept working
for me no matter what i threw at it and over
my career you know i i got i got better and then
learned how to do things really way more efficiently and then Postgres became
to shine even more so you know for me it's just super flexible always been able
to just work for me don't have to hassle too much with tuning it or anything
it just it just works isn't.
Postgres already production ready out of the box what exactly are you building
on top that goes beyond just hosting it.
As you do get to having very specific types of workloads,
and you want to start to make some trade-offs to really get all the juice you
can out of Postgres, you have to start to make some decisions.
And that's where you don't have to touch the C code, but you might have to touch
some of the configuration.
For example, there are certain configs in Postgres like shared buffers that
kind of decides how much of your system's memory can you allocate to put the
working set from your database into memory.
So that's something that we can set dynamically
for you and then certain extensions
that can really improve and change your experience
with Postgres those can be quite a hassle to
get installed so we pre-install those
into Postgres for you and kind of
go through the go through the process of you know for for specific types of
workloads what are the best extensions to make Postgres the best that it can
be for for that workload so it's kind of configuration and getting the extensions
installed and providing a really good experience around that.
So you're dealing with configuration extensions performance tuning when you
started building all of this was rust the obvious choice or did you consider
c c++ or python maybe as an alternative.
Well since Postgres is
written in c that was definitely something that
that came up the experience of
you know building software in c is not
you know it's not super modern
you know so for somebody like me i i think
the first programming language i learned outside of
like html and whatnot was like R
for like statistical programming and then Python and in Python you have stuff
like the Python package index where you want to do something well hey there's
like a library for that and you can just pull in that library and work on solving
the specific problem that you're trying to solve.
That kind of exists in C but it's not at all as quite as easy to just get up
and running building stuff in C so even though,
Postgres is written in C, and it's super performant and very stable.
It is hard to just come up on it and start to add functionality.
But in Rust, Rust does have cargo and crates.
And if you are trying to build software, you can get up and running pretty quickly
with Rust and using other libraries that are in crates.io and whatnot. okay.
I get the c limitations like no modern tooling painful package management but
you mentioned coming from python which has great abstractions and developer
experience why not use python for this did you at least prototype in it before jumping to rust.
Well at at the time that at Tembo
that we were getting started building extensions there was
a framework that made it really easy to get
up and running with rust that framework's called pgrx
well at the time it was just called pgx they renamed
to pgrx at at some point but
to go from like nothing to a hello world example it was like an hour or less
to create an extension that you know you write some function in rust turn that
function into a sequel function that you could call that was like super easy so So it was kind of,
you know, like we didn't have to, we were up and running really quickly.
So we didn't really explore Python 2 seriously.
Okay, that's pretty impressive. So PGRX made it super easy to get started.
But I don't really understand how Postgres extensions actually work.
You write a Rust function, somehow becomes a SQL function.
What's happening under the hood? Is there some kind of plugin manager?
How does your Rust code actually talk to Postgres?
Yeah, so you can write functions in pure SQL.
There's procedural languages like PL/pgSQL that you can kind of still just write,
you know, kind of SQL and easily create these functions.
So there's an object that lives in Postgres, and it's a function that's an object in Postgres.
And that function can kind of point to SQL code or one of the procedural languages,
or it could point to a shared object.
So now that shared object could be compiled in C or it could be compiled in Rust.
So in the case of building extensions with PGRX, you have this function and
it really just points to a shared object that's the code that you wrote in Rust.
So when you compile a PGRX extension, you might create some functions.
There's more complicated things you can do than just functions.
But in this case of a function, there's some code that's like,
hey, I'm creating a Postgres function and that function is pointing to your Rust shared object.
But wait, if extensions are just functions pointing to shared objects, how do you handle state?
Functions are supposed to be stateless, but any real application needs to manage state somehow.
What happens when you need to maintain data across calls or coordinate between operations?
Yeah, so it does get complicated to manage state.
I think the default, like when you call that function
everything is starting to happen within a transaction so you
can you know you can start to store state just
in memory like as soon as you call that function you
imagine like you're gonna read some data from
a other table in Postgres and you
can pull that into memory and then go do whatever it is that
you want to do and ultimately you know
return some result set back to who's ever calling that
that function so you could store it in memory or you could store it in some
intermediate tables in back in Postgres but in those cases you're always you'll
be working through the Postgres spi and that's the server programming interface
so you have some tools available to you to kind of deal with your state there and.
The other thing that bothered me while you explained this you're going
through this spi this server programming interface dealing
with ffi boundaries don't you lose all the conveniences and ergonomics of rust's
type system i mean how would Postgres even know that a function is really available
all of rust's advantages are in its type system and safety guarantees don't
you lose all of that at the ffi interface.
In the like memory space where your rust code is written you still have all
of rust features available to you so you know as you interact with FFI, you get some,
All the FFI integrations, if you're using PGRX, like PGRX abstracts that for you mostly.
So you just write Rust code the same as, generally the same as you would if
you're, you know, writing a web application or, you know, some script in Rust.
Okay, you convinced me. PGRX seems like the way to go if you want to start a new extension.
But did you ever hit any limitations which might make you move away from PGRX
or is everything pretty much covered?
A lot of stuff is covered. There are some limitations and I feel a little bad
in what I'm about to say because I wish I could have contributed back to PGRX.
But some of those SPI, the server programming interface in PGRX are just not fully optimized.
So if you have some SQL that you want to execute through the SPI from using PGRX...
You have to, data that kind of goes to and from Postgres through that SPI,
you have to serialize and deserialize all of it, even if you're not really going to touch it.
So there's a project, a message queue project that we worked on called PGMQ.
It was originally written in Rust using PGRX. And particularly for the batch
operations where we would, you know, send or read, you know, 100 messages at a time,
those would get real slow because we would have to iterate over everything,
every message and just deserialize it and then resealize it as we would,
you know, send it back to the user.
But in those, really in those cases, like there's not a ton of advantage to,
you know, just not be using just pure SQL for those cases.
Does this mean the code base is now mostly SQL with a nice Rust wrapper on top?
You've kind of inverted the pattern.
Instead of writing a pure Rust Postgres extension, now it's SQL with a nice
wrapper around it, sort of.
Like, what are the boundaries between SQL and Rust in that case?
So PGMQ, I can just tell a little bit of background about that project.
So at Tembo, we needed a queue between our control plane and our data plane.
Data plane is like where we spin up people's databases, and control plan is,
you know, backend web applications.
So we were building this queue on Postgres and all of our tech was already in Rust.
So we basically had a bunch of like SQL strings like written in Rust code.
And that code would be duplicate across the control plane and then the same
service in the data plane would have like the same SQL statements for like sending
messages and reading messages.
And that was all Rust. So we first pulled it out into a crate and then installed
the crate all the places that it needed to be.
So it was all client side.
And, you know, since it was a crate, it was already kind of packaged up.
If we could turn the crate into a Postgres extension using PGRX with,
it was like, I don't know, a couple hours worth of work to take that crate and
use PGRX and create an extension out of it.
So then in that case, like there were all these shared objects for every function
in Postgres and, you know, it was great.
We had this extension and we could ship that extension around to different Postgres
instances and share it with the world.
And, you know, it was great. And then we started to run into those limitations
where, you know, it was really how we were using PGRX made some things slow.
So we actually was with the help of the community and we even got some help
from folks at Supabase to rewrite it from PGRX Rust into PLPG SQL.
And that was just for like the queue operations.
And you know there's still you know we still want
to use it use that extension at Tembo
and all our stuff is in rust and you know
tons of people on around the world are have their applications in rust and want
to use this extension too so there's a the rust client library for it and and
and that is to like give you the idiomatic rust experience for for using pgmq on on Postgres.
It's very nice pattern you go from plain vanilla SQL extract it into a separate
crate and make it a Postgres extension with ptrx and I really like these code
usability stories I hear that a lot and I really like that about Rust and being
able to kind of move things out into separate crate,
so that was the transition if I understood correctly but
then you run into serialization issues because when
you try to get batches of messages over the wire it's probably
an O of N operation to go through each and
every message and deserialize it with 30 and that can be slow and eventually
you want to go back a little bit and write parts of the more performant code
in maybe a lower level abstraction or maybe not use pgrx for that case right yeah.
That's exactly it you know we talked to the the pgrx maintainers about what
we were doing and they're like hey,
You know, what you're doing doesn't really need to be in PGRX.
You're just executing SQL statements.
And yeah, there are some limitations right now with the maturity of the API
that PGRX had around just executing SQL statements.
So it was kind of a, yeah, we should, this should be SQL instead of, you know, in Rust.
But couldn't you have kept a hybrid approach? ptrx for most things SQL for the
performance critical parts what made you abandon ptrx entirely.
We could have but there's some additional like gotchas with extensions in Postgres,
and it's mostly around the like portability of
extensions so if you
want if you have an extension in Postgres extensions
have like the objects that we kind of talked about earlier and
they also have control files and migration
files and all these other things and for
it to be an extension those things have to be on like the
same host as Postgres but like
let's say you're running on rds or you're running in supabase or google cloud
somewhere if it's an extension then you have to deal with all these additional
things but if it's just sql then you can just take that sql with you and manage
it however you want to from.
That perspective it makes total sense the pgmq client caught my attention because
unlike NATS or Kafka it's just Postgres so no need for new tech in your tech
stack but what impressed me the most was how rust like the api feels.
How did.
You make a SQL-based queue feel so idiomatic to Rust developers?
Yeah. So, you know, my opinion, and I didn't write all the client-side code
here, so we had some help from the community who, you know, built some of these things.
But the things that I care about in the client are establishing a connection pool to Postgres,
And you can let the client do that for you, create a pool to Postgres, a connection pool.
But maybe you're doing a bunch of other stuff in your application too.
So we have the ability for you to create the pool yourself and like provide
the pool to the client and let PGMQ's client use your pool.
And so with in the
client if with Rust there's like some generics typing that
we could do with that to be like you know
how we execute the the Postgres
functions for the the queue you know
we just take in certain as long
as the objects have certain traits implemented we can execute that that sequel
and you know like if if it's in a different language i don't you know i don't
know if like it'd be way more complex but like since we can just make these
assumptions like hey pass in this thing as long as it implements these traits we're good to go you.
Mentioned sqlx some people might
not be familiar with it what makes sqlx special for this kind of work.
Yeah and in fact the pgmq client uses sqlx so yeah so sqlx i describe it as
like everything that i care about from an ORM, but it's not an ORM.
All the great things and none of the bad.
So SQLX, you know, it does a lot of things and I don't use all of its features,
but the thing I like the most is it gives me the compile time checks on the SQL that I write.
So if I have some insert statement in my application and it's just written in raw SQL.
I can rest assured that, you know, if I have some struct that I'm trying to, you know,
serialize and insert into a row, if like the types on my struct are not in compliance
with the table, SQLX will give me a compile error.
Yeah, I love SQLX.
It's like an ORM, but better. So what I like about ORMs is that they give me
this really nice interface between my application and the types in the database.
So if I have some struct, struct has attributes on it and those are typed,
an ORM would be like, yeah, there's this connection between this object and
the table or many tables.
But I actually really like to write SQL and
a lot of times I end up with SQL statements that don't
really fit in ORM super well so I end up with this code base with like some
SQL and some ORMs and it gets really messy but with SQLX I can just always write
SQL and I love that as a developer because I enjoy writing SQL,
but I still get these type checks because SQLX will look at my SQL statements
and then check in the database,
does the SQL statement and what you're trying to insert or what you're trying
to read and deserialize into some struct.
Does it is it going to work do the types match up and that that is huge it could
be a little frustrating at first until you kind of get the hang of it but that
piece right there is kind of what hooked me on sqlx now.
I would guess the other important part is async support because all of these
operations are io bound and you kind of don't want to block the client for concurrent queries,
what did you end up using for async and was it straightforward to integrate into the existing code.
You know rewind way back to when
we were like first building first started building in
rust and it was like okay so for
async we like we kind of have to handle this
ourself and that was there's a
ton of complexity around that that like frankly we
like didn't want to spend our time like trying to implement and
so then there's a few different runtimes out there and Tokio was
like clearly the one that was the most most popular the most supported biggest
community around so we just use that and you know to this day like when i'm
building in rust i don't even i don't really think about which async runtime
to go with i just use Tokio right.
But what about very complex things like supporting transactions and by transaction
i mean multiple statements that are executed atomically on Postgres if you wanted
to do that in an async way i would be afraid that it would get really complex.
Yeah i i don't exactly know how you would if you had like two separate threads
and you wanted to span some transaction across those i don't,
I don't know. I haven't actually tried that. So I don't know how it would work.
But kind of the way that we implemented the, I think earlier I mentioned like,
yeah, this, you know, you can bring your own connection pool or really your
own executor, as long as it has certain traits implemented on it, which is basically,
you know, we inherit that from SQL X.
But you can, you know, you can create, start a transaction, do whatever SQL you want.
You could your like connection or your
transaction object that you've created and if you pass that into pgmq
then you could go do the pgmq things through the
pgmq api and then finally commit your transaction
however you however you you want to yourself you know so so like we lean on
Tokio we lean on sqlx for that oh yeah the question about like to do that across
threads i don't that could get tricky to like pass that connection across threads
i don't know how you would do that it might be possible you.
Could probably share the connections across threads with
something like arc mutext and sqlx allows that kind of pattern but at the same
time why would you because Postgres already handles that with transactions and
that guarantees atomicity so maybe you just leave it as an implementation detail to the database.
Yeah from Postgres side you know that's just it's the feature transactions you
know we don't have to reinvent the wheel on that part some.
People might hear this and think why do i need pgmq
i can just write my own abstraction on top
of Postgres how hard can it be can only be a couple lines of code right and
we touched on transaction support and edge cases but maybe you can allude to
a few more things that you probably don't want to do yourself things that people
tend to forget when they build their own abstraction functions.
Yeah i mean with all this stuff you know it's just
code so you can you can always just go write write the code you know there's
these operations that you need to do if you're going to build a queue you need
to have a way to create a queue so how do you how do you define what a queue
is you could you can think through that and make it what you want we have a
function called create queue that,
does it a way that we think is right and there's an api around it you know sending
a message to a queue it's basically just an insert statement you know but that
insert is handled for you,
and if you're like if you're working in rust you have some message and that
message becomes like an attribute like in a column on a table so you need to
serialize that that message and get inserted into the table.
So if you're using the Rust client, you know, there's some helpers there to
do the serialization for you.
So, you know, you save a lot of time by, you know, having this stuff like it's
already built, it's already tested, you don't have to rewrite a bunch of things.
But, you know, I'm sure there's cases out there where somebody like it makes
sense for somebody to write their own.
Even if you asked me to design a message queue table schema,
I'd probably miss something important.
I'd add a timestamp, maybe a byte array for the message, though I'm not sure
if you support bytes or just UTF-8 strings.
Then some kind of locking or ownership fields to prevent reading the same message twice.
What actually goes into that table?
Yeah, I mean, you're pretty close there. So there's a concept that,
you know, we're inspired from the SQS, Simple
Queue service from aws the visibility
timeout which is kind of that that locking
feature a little bit so when you read a message
from a queue you have to specify how long
do you want that that message to be unavailable to yourself or if you were to
try to read again or to any other consumers of of that queue you know so let's
say you have like 10 threads all reading from the same queue that the first
time the message is read,
you could set it to say, hey, make this message invisible for five minutes.
And then during that five minutes, any of those other threads,
none of them will be able to get that exact same message until that message,
that time, the visibility time expires, and then that message becomes visible again.
So that that piece
i think it's super cool because without that
you'd have to have some additional worker process that
would watch the queue and look for messages that need
to be you know need to be flipped back available again but in this way it's
like completely stateless everything is just there's nothing that can really
break there you know there's no process that crashes and all of a sudden you
know messages get stuck they just automatically become visible again that's.
Clever it's sort of a log free algorithm because you just update the timestamp
when you read it and everyone knows the message is in flight.
Yeah the the most complicated part
of pgmq is the read statement uh it
is a super complicated query it's a it's
a for update statement so it it's
reading but it's updating the the how
many times it's been read so there's like a counter on that
on that table that keeps track of how many times this
message has been read there's that visibility time which you know it's some
time that you know anything after that time that message can be read yeah so
you know there's when you read the message you're actually it's an update statement
to the table and it's a select for update which actually creates a lock on that table.
And that makes it so that if two threads read at the exact same time,
Postgres would figure out, hey, who gets this record first?
So the for update just guarantees that only one person, one worker can get the message.
But that does create a lock on the table.
So we say for update, that creates a lock.
And then there's a another clause in there
skip locked which means any records
that are locked skip over them and go to the next one and that
makes it so that any other workers that are reading aren't sitting there waiting
for that lock to leave they can skip it and go to the next message and so that
lock is is pretty quick it's that lock only lasts for the duration of you know
that that transaction which is pretty quick and like the long-term locking is
handled by the visibility timeout.
You probably came up with that on the first try, right?
No, I mean, there's a ton of resources out there.
If you just search, you know, message queue, Postgres, there's a lot of people
have written about for update and skipped locked. It's kind of the standard.
But still, you can make a lot of mistakes when implementing that.
And also, you might not know about this research in the first place.
You might just naively implement it and do it the wrong way.
Yeah definitely like you know if you if you don't have the skipped lock on there
you could be like you know you have 10 threads reading,
one of them got a message and the other 10 are just sitting there that you like
they would just sit there forever you know until that initial lock was released
so yeah you could you could mess it up i.
Really like these implementation details that's what i live for because that
could really hamper your performance and in the worst case you don't realize
until you're on the load and when you really need it the most it will kind of fail on you.
Yeah yeah that that's that's a
good you know i'm glad you mentioned that because that's kind of a you
know a reason why to use something that's
been pre-packaged you know because you can you
know there's a lot now today there's quite a few people using pgmq and
you know when people run into
issues like issue gets created on on the
project and then you know we or somebody in
the community resolves it but there's there's ways
that you know try to build your own queue like there's ways it can
go wrong so you know pgmq as a project has kind of started to be this like place
that people go to learn and just use the code of you know when they want to
do a queue on Postgres they can you know look at that project and either use
the code or modify it and do it their own way that's.
Why i love open source in the first place because so much
is out there and you can look at the actual implementation and then decide if
you really want to go down that rabbit hole and implement it yourself because
just sending in a pull request will fix a problem for every instance out there
that has the latest version of course which is so so great.
Yeah how.
Hard would it be to abstract that to other databases for example if there was
a customer that needed MariaDB support or MySQL.
Support.
How hard would that be if you had to support that?
I guess, you know, it would be, there's some SQL files in the PGMQ project,
and it would be like, hey, for every one of these statements,
what's the equivalent in the other database?
And if you translated that, then it, you know, theoretically should work.
The cool thing is that you could use Rust's feature flex.
So the project builds against the database that it depends on.
I was wondering about Postgres Listen and Notify support.
Where do you draw the line between using native PopSup versus needing a full message queue?
We've had some discussions of using those features from Postgres in PGM queue.
But I don't think they are exactly, like you can just replace PGM queue with alert and notify.
Mostly because you know if if you're going to send the
message across one of those notification channels like
if that message doesn't make it then it's kind
of gone you know and you have no record of it you know in pgmq it's every message
is a a row in a in a table and you can archive that row or just completely delete
it if you want so you still have like a complete audit log of every message if you want it,
if we're just to move it into one of those other channels and it's kind of gone,
you know, unless we build something to handle it.
But I do think it would be really useful to use it as a way to notify consumers
that there are messages available in AQ.
So right now, you know, if your application is reading messages from a queue,
you have to pull the queue,
you know, pull it once every second or have some back off logic,
you know, pull it every second and then back off to 10 seconds or something.
But it would be nice to be able to subscribe to the queue and have one of those
notifications come to you and say, hey, there are messages now, now pull.
And that could really help some
efficiency if you have really low latency requirements and you can't wait.
A one second pull interval is too long and you need to know right away.
You know that uh and you don't want to just pull every you know 10 milliseconds
or something less if we had something implemented on those features that could
it could help a lot i think also.
On top of this you could use the Tokio stream abstraction and you could just
iterate over the messages and the futures would resolve as quickly as messages come in and it.
Would all.
Kind of beautifully happen under the hood where you say well you get a future
ready and it contains the value and that is your message and i guess it would
be kind of nice to have that how far away are you from that reality.
Well i think we would first need to to get the you know implement a way to like
set up those channels on any given queue and i don't think that would be super difficult,
there's probably a way that this could all be implemented purely
on the the rust client side too so if
i think of how like lib rd
kafka reads messages from a
kafka topic you know it pulls but it
is pulling messages in in batch to the client and then the consumer of lib rd
kafka you know iterates over those those messages and we could probably do something
similar in the rust client you know where you you have some pull interval but
we're handling that pull asynchronously and pulling messages back in batch.
And then your Rust application could iterate over those messages asynchronously if it wanted to.
Yeah, I always really liked how LibRD Kafka did it because as a user there,
all that stuff is abstracted from you, but it's super efficient because it is pulling for you.
And unless you really get into the weeds, you don't really know that it's pulling
and it's pulling messages in batch.
But we could probably do the exact same thing in the Rust client.
What's your biggest deployment at the moment?
So I think I mentioned earlier this architecture with a control plane and a
data plane. We have several data planes.
There's a data plane in AWS, Azure, Google Cloud, and then some self-hosted ones as well.
But all those public clouds, they all read messages from a single Postgres instances,
and there's multiple queues within that.
And I think at peak, I think there's like 10,000 per minute,
maybe five minutes or something.
But really like the scalability of that it's all comes down to what can Postgres
handle how many inserts how many updates per second can Postgres handle is under
because that's really all that pgmq is doing is inserts and updates well inserts
updates and deletes okay.
So if you use that for a larger deployment it certainly won't be the bottleneck
in your application unless you are at web scale and then you always have bigger problems you.
Have bigger fish.
To fry okay.
Yeah what's.
Your setup for the production cluster right now.
It's a dedicated Postgres cluster for
the for the queue yeah and it's
those specs on it are exactly right yeah and
it it like never really gets over 20 cpu or memory memory utilization is kind
of just hovers around 25 probably because that's what shared buffers are set
to on that thing but i mean it is dedicated to the queue so you know it's kind
of it is i think it is over provision well.
That's at least better than being on the provision what are some of the other
use cases of Postgres that you see in production a lot.
Yeah you know full text search is it is built into Postgres so if if you're
trying to do that like it's pretty easy to get up and running we also have a
number of people using our VectorDB stack,
which that's primarily PG Vector, which is another open source project out there
that lets you kind of gives you the data type of a vector and then the operations
to do similarity search on top of it.
It is definitely the gold standard for,
working with embeddings within Postgres. And then we built kind of a wrapper
extension around it using PGRX and Rust.
It lets you, say you have a table with some text in it and you want to generate
embeddings from every row in that table.
And be like, hey, I have this table. I want to use OpenAI or I want to use Anthropic
or I have a self-hosted model. And I just want to get embeddings for this table.
So our wrapper extension, it's called PG Vectorize.
You just call a function on a table, tell it which model you want to use,
and then in the background, it's using PGM-Q to look at the table and be like,
hey, which columns do we need to get embeddings for?
Pull that data, call the transformer model, get the embeddings,
and insert those embeddings to another table or the same table. It's configurable.
But yeah, it helps you with the orchestration there.
I like that you use your extensions in combination with other extensions and
calling into them. It's kind of cool.
How do you ensure safety between the boundary of Rust and Postgres?
That Vectorize extension, it runs in a background process that Postgres manages.
And we purposely don't use the SPI, that interface previously,
because it's kind of unnecessary.
So we just use SQLX. So Postgres spins up this background process that,
reads from that queue of jobs that it needs to create embeddings for.
And we just use SQLX for it. So it's kind of treated as just like a normal application internally.
So there is no like interacting with FFI or anything going on there.
We treat it as just after Postgres starts it, then it's like normal Rust application.
Since we built it that way, we have the option on our cloud platform to run
that background worker in a separate container.
So instead as a Postgres background worker we
just take that same rust binary and run it in a separate container next to Postgres
so that's an even even better way to get around that limitation of like memory
management well if it's not even on the same host as Postgres then you know
you can scale that thing independently with.
Tembo you shipped your first real rust production application what took you
the longest to wrap your head around in rust.
Yeah so today i love
working in rust it it is a
joy to build software using rust but
when i was getting started there was this huge hump
this hill to get over and it was
super frustrating early on one of those
things was just wrapping my head around
like null like they're like you
know there is no no you have like some and none and i
looked at that and i'm like this is so weird it i
came from python so you know you had a none type but there was no it was not
an enum and until i like learned that it like it's an enum with like data is
what you know how Rust handles null values so it's just like wrapper around your.
Data, and is it null or is it not?
That was, I don't know why, but it was really hard for me to grasp that.
And in that same, like, direction, error handling, where it's like, okay, or error.
I don't know why it just took me so long to wrap my head around.
Like, yeah, you basically wrap your data, your, if you have a function,
and it could error, well, you wrap it in this thing, a result,
you know, and it's an enum.
And it's the same for none, or like how Rust handles none.
And I don't know I think I think like not today I love those things it just makes it so clean,
like is there an error or not is it did
it return something or was it there nothing you know
it's super clean when I look at it today but when I was learning rust I'm like
what is this this I don't even understand so I don't know what would have like
made it easier for me to learn that early on but i remember it being super frustrating
and then all of a sudden i got it like oh this is this is awesome i love it.
How long did it take you until you had a good grasp on the language.
I'd say so when i when i started i was like learning full-time basically when
i was learning Rust and i think it was two months you know maybe that's slow
for some people like like all day,
five days a week at least you know i was able to get up in like the building software,
and still not quite understanding error handling and none you know handling
of none types so but yeah i think it was two months for me of pain two months of pain i.
Really like that framing I mean.
But the one thing, the one thing I loved like day one though,
was having cargo, like, you know.
I came from Python, and it was like, how do I get my environment set up to run my application?
At the time, it was like, do I use virtual env? Do I use poetry?
Do I just install everything with pip?
You know, it's just, it's a mess. Like, there's some projects that are making
it better in Python today.
Like, UV is doing a really great job cleaning up that, solving that problem.
But when I was learning Rust, it was like, hey, how do I run this?
How do I run this application?
Oh, Cargo Run. that's it you know how do
i test it cargo test you know it it was just so intuitive you know how do i
add a library from crates cargo ad you know and it was just all built into the
tool chain and i didn't have to go search and read a bunch of blogs to figure
out how to just get started,
so that from day one was probably what kept me there through those two months of pain oh.
Yeah day one in python is strange because on one
side the language is great but on the other side the tooling is
not so great and you really get
those mixed feelings on that note
shout out to Charlie Marsh from Astral who was a guest in episode three of season
four you might want to check out that one so they do great work but the experience
before that was subpar the tooling was a bit all over the place and very difficult to use.
Yeah i have like nightmares of so when i was working at shipped i all of a sudden
they started issuing employees the macbooks with apple silicon so they all had arm architecture,
And all of our, like not every project out there had Python wheels for ARM.
So now like all of a sudden new employees coming in, like their local environments
would just not work with our internal libraries because we weren't building wheels for ARM.
Like we were using Kafka and librd or Python's Kafka library didn't have a
wheel for ARM for the longest time.
It was like, okay, we have to compile this stuff from source all of a sudden.
And it was I had nightmares from that everything fell apart.
Well at least they did a great job on the migration from Python 2 to 3 that
was extremely painless of course.
Just.
Kidding, it was a nightmare.
Yeah the most obvious one there is just like trying to print something,
it's like the print API changed.
You know yeah, thanks for bringing back that memory by the way,
do you have to touch a lot of python nowadays.
Not not too much you know there's a lot of stuff in machine learning is still
you know there's lots of libraries out there for machine learning things in python and,
in the machine learning space python's mostly
used as a wrapper around some c library
that's super well optimized rust is i'd
say catching up and like hugging face has
built some rust libraries that kind of do the same thing
as the python equivalents so but yeah
most most of the time if i'm building a web server today
i'll just grab actix and and run with it you know if i'm working with data and
Postgres i'm going to grab sqlx 10 years ago it would have been like flask or
fast api or maybe not 10 years ago for fast api but you know i have the equivalent
of everything that I would,
you know, five years ago would have gone to something in Python. I have,
For me, I'd just pick Rust. It's just easier. It's a better experience.
And on that very positive note, what's your final message to the Rust community?
Yeah. I mean, keep contributing to the project. The project doesn't live unless
people are working on it and making it better.
Today, there's really no such thing as finished software or a finished programming language.
It's not like software is distributed in the
mail where you burn stuff to a disc and mail it out and
it just runs that way forever software is living breathing organisms now so
things have to be constantly fed and for rust like if people stop contributing
to rust and you know like the project won't go on.
So I guess my biggest message would be thank you for building awesome stuff
and please keep doing it.
Adam, thanks a lot for the interview.
Thank you.
Rust in Production is a podcast by corrode. It is hosted by me,
Matthias Endler, and produced by Simon Brüggen.
For show notes, transcripts, and to learn more about how we can help your company
make the most of Rust, visit corrode.dev.
Thanks for listening to Rust in Production.
Adam
00:00:27
Matthias
00:01:11
Adam
00:01:24
Matthias
00:02:20
Adam
00:02:27
Matthias
00:03:38
Adam
00:03:53
Matthias
00:05:11
Adam
00:05:29
Matthias
00:06:16
Adam
00:06:39
Matthias
00:07:46
Adam
00:08:07
Matthias
00:09:00
Adam
00:09:25
Matthias
00:09:56
Adam
00:10:11
Matthias
00:11:28
Adam
00:11:46
Matthias
00:14:11
Adam
00:15:01
Matthias
00:15:33
Adam
00:15:42
Matthias
00:16:33
Adam
00:16:52
Matthias
00:16:53
Adam
00:16:59
Matthias
00:18:14
Adam
00:18:21
Matthias
00:20:37
Adam
00:20:55
Matthias
00:21:40
Adam
00:21:55
Matthias
00:23:06
Adam
00:23:27
Matthias
00:23:35
Adam
00:24:00
Matthias
00:25:14
Adam
00:25:37
Matthias
00:27:09
Adam
00:27:18
Matthias
00:28:56
Adam
00:29:00
Matthias
00:29:13
Adam
00:29:25
Matthias
00:29:45
Adam
00:30:00
Matthias
00:30:49
Adam
00:31:10
Matthias
00:31:11
Adam
00:31:19
Matthias
00:31:21
Adam
00:31:24
Matthias
00:31:40
Adam
00:31:57
Matthias
00:33:45
Adam
00:33:55
Matthias
00:33:56
Adam
00:34:11
Matthias
00:35:33
Adam
00:35:35
Matthias
00:36:26
Adam
00:36:36
Matthias
00:36:37
Adam
00:36:38
Matthias
00:36:40
Adam
00:36:42
Matthias
00:37:12
Adam
00:37:19
Matthias
00:38:44
Adam
00:38:56
Matthias
00:40:05
Adam
00:40:13
Matthias
00:42:10
Adam
00:42:15
Matthias
00:42:50
Adam
00:42:52
Matthias
00:43:55
Adam
00:44:25
Matthias
00:45:16
Adam
00:45:22
Matthias
00:45:23
Adam
00:45:26
Matthias
00:45:33
Adam
00:45:41
Matthias
00:46:41
Adam
00:46:47
Matthias
00:47:36
Adam
00:47:38
Matthias
00:47:40