WEBVTT

00:00:01.630 --> 00:00:06.230
<v Matthias>This is Rust in Production, a podcast about companies who use Rust to shape

00:00:06.230 --> 00:00:07.310
<v Matthias>the future of infrastructure.

00:00:07.610 --> 00:00:10.890
<v Matthias>It's Matthias Endler from corrode, and today's guest is Adam Hendel,

00:00:11.190 --> 00:00:12.610
<v Matthias>founding engineer at Tembo.

00:00:12.730 --> 00:00:16.950
<v Matthias>We talk about production-grade infrastructure on top of Postgres with Rust.

00:00:19.150 --> 00:00:24.150
<v Matthias>Adam, thanks for joining us today. Can you introduce yourself and tell us about

00:00:24.150 --> 00:00:26.470
<v Matthias>Tembo? What exactly are you building there?

00:00:27.530 --> 00:00:30.510
<v Adam>Yeah, thanks for having me. I'm Adam Hendel

00:00:30.510 --> 00:00:33.550
<v Adam>i am one of the first engineers hired

00:00:33.550 --> 00:00:36.330
<v Adam>into Tembo and Tembo is a

00:00:36.330 --> 00:00:39.010
<v Adam>managed Postgres platform you know

00:00:39.010 --> 00:00:42.570
<v Adam>where you can click a few buttons and get a managed Postgres database pretty

00:00:42.570 --> 00:00:49.230
<v Adam>quickly yeah and like on the Tembo platform one of our main products is what

00:00:49.230 --> 00:00:53.310
<v Adam>we call stacks and that's where you can get Postgres pre-configured for certain

00:00:53.310 --> 00:00:57.950
<v Adam>types of workloads for you with those same few clicks.

00:00:58.150 --> 00:01:03.970
<v Adam>So if you're doing OLAP or search or doing something with AI and you need embeddings,

00:01:04.230 --> 00:01:08.750
<v Adam>in that example, we have a vector DB stack that gives you everything you need

00:01:08.750 --> 00:01:10.410
<v Adam>to go build your application.

00:01:11.350 --> 00:01:14.550
<v Matthias>What is it about Rust companies and their love for Postgres?

00:01:14.770 --> 00:01:19.490
<v Matthias>It seems like every other Rust startup I talk to is building on top of Postgres.

00:01:19.670 --> 00:01:23.550
<v Matthias>What makes this 40-year-old database so appealing to modern developers?

00:01:24.430 --> 00:01:31.730
<v Adam>I mean, I love Postgres because it just works. I learned to write SQL on Postgres.

00:01:32.290 --> 00:01:36.710
<v Adam>And earlier in my career, I was doing a lot of stuff in data science and machine

00:01:36.710 --> 00:01:42.750
<v Adam>learning, and I didn't know really what write looks like in terms of good SQL

00:01:42.750 --> 00:01:46.710
<v Adam>queries or efficient ways to build an application.

00:01:47.470 --> 00:01:50.370
<v Adam>And Postgres was just there so i just abused the

00:01:50.370 --> 00:01:54.030
<v Adam>hell out of it and it always just kept working

00:01:54.030 --> 00:01:57.490
<v Adam>for me no matter what i threw at it and over

00:01:57.490 --> 00:02:00.910
<v Adam>my career you know i i got i got better and then

00:02:00.910 --> 00:02:07.330
<v Adam>learned how to do things really way more efficiently and then Postgres became

00:02:07.330 --> 00:02:12.670
<v Adam>to shine even more so you know for me it's just super flexible always been able

00:02:12.670 --> 00:02:17.950
<v Adam>to just work for me don't have to hassle too much with tuning it or anything

00:02:17.950 --> 00:02:20.050
<v Adam>it just it just works isn't.

00:02:20.050 --> 00:02:24.270
<v Matthias>Postgres already production ready out of the box what exactly are you building

00:02:24.270 --> 00:02:27.070
<v Matthias>on top that goes beyond just hosting it.

00:02:27.070 --> 00:02:31.490
<v Adam>As you do get to having very specific types of workloads,

00:02:32.500 --> 00:02:36.500
<v Adam>and you want to start to make some trade-offs to really get all the juice you

00:02:36.500 --> 00:02:39.920
<v Adam>can out of Postgres, you have to start to make some decisions.

00:02:40.180 --> 00:02:44.200
<v Adam>And that's where you don't have to touch the C code, but you might have to touch

00:02:44.200 --> 00:02:45.200
<v Adam>some of the configuration.

00:02:45.480 --> 00:02:51.920
<v Adam>For example, there are certain configs in Postgres like shared buffers that

00:02:51.920 --> 00:02:58.340
<v Adam>kind of decides how much of your system's memory can you allocate to put the

00:02:58.340 --> 00:03:00.860
<v Adam>working set from your database into memory.

00:03:01.760 --> 00:03:05.540
<v Adam>So that's something that we can set dynamically

00:03:05.540 --> 00:03:08.940
<v Adam>for you and then certain extensions

00:03:08.940 --> 00:03:12.000
<v Adam>that can really improve and change your experience

00:03:12.000 --> 00:03:14.780
<v Adam>with Postgres those can be quite a hassle to

00:03:14.780 --> 00:03:17.820
<v Adam>get installed so we pre-install those

00:03:17.820 --> 00:03:20.640
<v Adam>into Postgres for you and kind of

00:03:20.640 --> 00:03:25.920
<v Adam>go through the go through the process of you know for for specific types of

00:03:25.920 --> 00:03:29.900
<v Adam>workloads what are the best extensions to make Postgres the best that it can

00:03:29.900 --> 00:03:35.840
<v Adam>be for for that workload so it's kind of configuration and getting the extensions

00:03:35.840 --> 00:03:38.760
<v Adam>installed and providing a really good experience around that.

00:03:38.760 --> 00:03:43.940
<v Matthias>So you're dealing with configuration extensions performance tuning when you

00:03:43.940 --> 00:03:49.340
<v Matthias>started building all of this was rust the obvious choice or did you consider

00:03:49.340 --> 00:03:53.240
<v Matthias>c c++ or python maybe as an alternative.

00:03:53.240 --> 00:03:56.020
<v Adam>Well since Postgres is

00:03:56.020 --> 00:03:59.000
<v Adam>written in c that was definitely something that

00:03:59.000 --> 00:04:02.280
<v Adam>that came up the experience of

00:04:02.280 --> 00:04:06.020
<v Adam>you know building software in c is not

00:04:06.020 --> 00:04:09.160
<v Adam>you know it's not super modern

00:04:09.160 --> 00:04:12.800
<v Adam>you know so for somebody like me i i think

00:04:12.800 --> 00:04:15.540
<v Adam>the first programming language i learned outside of

00:04:15.540 --> 00:04:19.060
<v Adam>like html and whatnot was like R

00:04:19.060 --> 00:04:25.500
<v Adam>for like statistical programming and then Python and in Python you have stuff

00:04:25.500 --> 00:04:30.440
<v Adam>like the Python package index where you want to do something well hey there's

00:04:30.440 --> 00:04:35.320
<v Adam>like a library for that and you can just pull in that library and work on solving

00:04:35.320 --> 00:04:37.420
<v Adam>the specific problem that you're trying to solve.

00:04:37.620 --> 00:04:42.940
<v Adam>That kind of exists in C but it's not at all as quite as easy to just get up

00:04:42.940 --> 00:04:45.500
<v Adam>and running building stuff in C so even though,

00:04:46.190 --> 00:04:49.110
<v Adam>Postgres is written in C, and it's super performant and very stable.

00:04:49.410 --> 00:04:53.910
<v Adam>It is hard to just come up on it and start to add functionality.

00:04:54.750 --> 00:04:59.090
<v Adam>But in Rust, Rust does have cargo and crates.

00:05:00.210 --> 00:05:04.950
<v Adam>And if you are trying to build software, you can get up and running pretty quickly

00:05:04.950 --> 00:05:11.450
<v Adam>with Rust and using other libraries that are in crates.io and whatnot. okay.

00:05:11.450 --> 00:05:17.910
<v Matthias>I get the c limitations like no modern tooling painful package management but

00:05:17.910 --> 00:05:22.510
<v Matthias>you mentioned coming from python which has great abstractions and developer

00:05:22.510 --> 00:05:29.610
<v Matthias>experience why not use python for this did you at least prototype in it before jumping to rust.

00:05:29.610 --> 00:05:33.170
<v Adam>Well at at the time that at Tembo

00:05:33.170 --> 00:05:36.010
<v Adam>that we were getting started building extensions there was

00:05:36.010 --> 00:05:38.730
<v Adam>a framework that made it really easy to get

00:05:38.730 --> 00:05:42.510
<v Adam>up and running with rust that framework's called pgrx

00:05:42.510 --> 00:05:45.730
<v Adam>well at the time it was just called pgx they renamed

00:05:45.730 --> 00:05:49.430
<v Adam>to pgrx at at some point but

00:05:49.430 --> 00:05:56.970
<v Adam>to go from like nothing to a hello world example it was like an hour or less

00:05:56.970 --> 00:06:02.530
<v Adam>to create an extension that you know you write some function in rust turn that

00:06:02.530 --> 00:06:07.990
<v Adam>function into a sequel function that you could call that was like super easy so So it was kind of,

00:06:08.890 --> 00:06:12.930
<v Adam>you know, like we didn't have to, we were up and running really quickly.

00:06:12.930 --> 00:06:15.010
<v Adam>So we didn't really explore Python 2 seriously.

00:06:16.270 --> 00:06:22.010
<v Matthias>Okay, that's pretty impressive. So PGRX made it super easy to get started.

00:06:22.070 --> 00:06:26.870
<v Matthias>But I don't really understand how Postgres extensions actually work.

00:06:27.370 --> 00:06:31.870
<v Matthias>You write a Rust function, somehow becomes a SQL function.

00:06:32.410 --> 00:06:35.910
<v Matthias>What's happening under the hood? Is there some kind of plugin manager?

00:06:35.910 --> 00:06:39.270
<v Matthias>How does your Rust code actually talk to Postgres?

00:06:39.950 --> 00:06:43.370
<v Adam>Yeah, so you can write functions in pure SQL.

00:06:43.790 --> 00:06:50.090
<v Adam>There's procedural languages like PL/pgSQL that you can kind of still just write,

00:06:50.770 --> 00:06:55.010
<v Adam>you know, kind of SQL and easily create these functions.

00:06:55.310 --> 00:07:02.350
<v Adam>So there's an object that lives in Postgres, and it's a function that's an object in Postgres.

00:07:02.650 --> 00:07:07.770
<v Adam>And that function can kind of point to SQL code or one of the procedural languages,

00:07:07.770 --> 00:07:10.550
<v Adam>or it could point to a shared object.

00:07:11.130 --> 00:07:17.170
<v Adam>So now that shared object could be compiled in C or it could be compiled in Rust.

00:07:17.410 --> 00:07:22.070
<v Adam>So in the case of building extensions with PGRX, you have this function and

00:07:22.070 --> 00:07:27.470
<v Adam>it really just points to a shared object that's the code that you wrote in Rust.

00:07:27.950 --> 00:07:32.110
<v Adam>So when you compile a PGRX extension, you might create some functions.

00:07:32.310 --> 00:07:35.490
<v Adam>There's more complicated things you can do than just functions.

00:07:35.890 --> 00:07:40.430
<v Adam>But in this case of a function, there's some code that's like,

00:07:40.450 --> 00:07:45.470
<v Adam>hey, I'm creating a Postgres function and that function is pointing to your Rust shared object.

00:07:46.050 --> 00:07:53.410
<v Matthias>But wait, if extensions are just functions pointing to shared objects, how do you handle state?

00:07:54.490 --> 00:08:00.470
<v Matthias>Functions are supposed to be stateless, but any real application needs to manage state somehow.

00:08:00.870 --> 00:08:06.890
<v Matthias>What happens when you need to maintain data across calls or coordinate between operations?

00:08:07.430 --> 00:08:10.390
<v Adam>Yeah, so it does get complicated to manage state.

00:08:10.610 --> 00:08:13.590
<v Adam>I think the default, like when you call that function

00:08:13.590 --> 00:08:17.430
<v Adam>everything is starting to happen within a transaction so you

00:08:17.430 --> 00:08:21.670
<v Adam>can you know you can start to store state just

00:08:21.670 --> 00:08:24.410
<v Adam>in memory like as soon as you call that function you

00:08:24.410 --> 00:08:27.370
<v Adam>imagine like you're gonna read some data from

00:08:27.370 --> 00:08:31.270
<v Adam>a other table in Postgres and you

00:08:31.270 --> 00:08:34.030
<v Adam>can pull that into memory and then go do whatever it is that

00:08:34.030 --> 00:08:37.250
<v Adam>you want to do and ultimately you know

00:08:37.250 --> 00:08:40.270
<v Adam>return some result set back to who's ever calling that

00:08:40.270 --> 00:08:44.490
<v Adam>that function so you could store it in memory or you could store it in some

00:08:44.490 --> 00:08:49.150
<v Adam>intermediate tables in back in Postgres but in those cases you're always you'll

00:08:49.150 --> 00:08:54.070
<v Adam>be working through the Postgres spi and that's the server programming interface

00:08:54.070 --> 00:09:00.190
<v Adam>so you have some tools available to you to kind of deal with your state there and.

00:09:00.190 --> 00:09:03.530
<v Matthias>The other thing that bothered me while you explained this you're going

00:09:03.530 --> 00:09:07.350
<v Matthias>through this spi this server programming interface dealing

00:09:07.350 --> 00:09:12.810
<v Matthias>with ffi boundaries don't you lose all the conveniences and ergonomics of rust's

00:09:12.810 --> 00:09:17.630
<v Matthias>type system i mean how would Postgres even know that a function is really available

00:09:17.630 --> 00:09:22.650
<v Matthias>all of rust's advantages are in its type system and safety guarantees don't

00:09:22.650 --> 00:09:25.750
<v Matthias>you lose all of that at the ffi interface.

00:09:25.750 --> 00:09:32.110
<v Adam>In the like memory space where your rust code is written you still have all

00:09:32.110 --> 00:09:38.850
<v Adam>of rust features available to you so you know as you interact with FFI, you get some,

00:09:39.530 --> 00:09:46.330
<v Adam>All the FFI integrations, if you're using PGRX, like PGRX abstracts that for you mostly.

00:09:46.870 --> 00:09:51.230
<v Adam>So you just write Rust code the same as, generally the same as you would if

00:09:51.230 --> 00:09:55.690
<v Adam>you're, you know, writing a web application or, you know, some script in Rust.

00:09:56.270 --> 00:10:02.270
<v Matthias>Okay, you convinced me. PGRX seems like the way to go if you want to start a new extension.

00:10:02.730 --> 00:10:08.810
<v Matthias>But did you ever hit any limitations which might make you move away from PGRX

00:10:08.810 --> 00:10:10.910
<v Matthias>or is everything pretty much covered?

00:10:11.950 --> 00:10:17.290
<v Adam>A lot of stuff is covered. There are some limitations and I feel a little bad

00:10:17.290 --> 00:10:22.470
<v Adam>in what I'm about to say because I wish I could have contributed back to PGRX.

00:10:22.690 --> 00:10:30.210
<v Adam>But some of those SPI, the server programming interface in PGRX are just not fully optimized.

00:10:30.770 --> 00:10:37.870
<v Adam>So if you have some SQL that you want to execute through the SPI from using PGRX...

00:10:38.810 --> 00:10:44.170
<v Adam>You have to, data that kind of goes to and from Postgres through that SPI,

00:10:44.310 --> 00:10:49.170
<v Adam>you have to serialize and deserialize all of it, even if you're not really going to touch it.

00:10:49.930 --> 00:10:55.450
<v Adam>So there's a project, a message queue project that we worked on called PGMQ.

00:10:55.730 --> 00:11:01.530
<v Adam>It was originally written in Rust using PGRX. And particularly for the batch

00:11:01.530 --> 00:11:07.710
<v Adam>operations where we would, you know, send or read, you know, 100 messages at a time,

00:11:07.890 --> 00:11:11.790
<v Adam>those would get real slow because we would have to iterate over everything,

00:11:12.650 --> 00:11:16.890
<v Adam>every message and just deserialize it and then resealize it as we would,

00:11:17.030 --> 00:11:18.510
<v Adam>you know, send it back to the user.

00:11:19.250 --> 00:11:23.790
<v Adam>But in those, really in those cases, like there's not a ton of advantage to,

00:11:24.050 --> 00:11:27.850
<v Adam>you know, just not be using just pure SQL for those cases.

00:11:28.470 --> 00:11:33.350
<v Matthias>Does this mean the code base is now mostly SQL with a nice Rust wrapper on top?

00:11:33.350 --> 00:11:35.650
<v Matthias>You've kind of inverted the pattern.

00:11:35.910 --> 00:11:40.630
<v Matthias>Instead of writing a pure Rust Postgres extension, now it's SQL with a nice

00:11:40.630 --> 00:11:42.210
<v Matthias>wrapper around it, sort of.

00:11:42.510 --> 00:11:46.030
<v Matthias>Like, what are the boundaries between SQL and Rust in that case?

00:11:46.870 --> 00:11:51.310
<v Adam>So PGMQ, I can just tell a little bit of background about that project.

00:11:51.950 --> 00:11:58.670
<v Adam>So at Tembo, we needed a queue between our control plane and our data plane.

00:11:58.870 --> 00:12:03.490
<v Adam>Data plane is like where we spin up people's databases, and control plan is,

00:12:03.510 --> 00:12:05.290
<v Adam>you know, backend web applications.

00:12:06.030 --> 00:12:12.070
<v Adam>So we were building this queue on Postgres and all of our tech was already in Rust.

00:12:12.290 --> 00:12:19.270
<v Adam>So we basically had a bunch of like SQL strings like written in Rust code.

00:12:20.250 --> 00:12:24.870
<v Adam>And that code would be duplicate across the control plane and then the same

00:12:24.870 --> 00:12:29.450
<v Adam>service in the data plane would have like the same SQL statements for like sending

00:12:29.450 --> 00:12:31.050
<v Adam>messages and reading messages.

00:12:31.590 --> 00:12:38.050
<v Adam>And that was all Rust. So we first pulled it out into a crate and then installed

00:12:38.050 --> 00:12:39.990
<v Adam>the crate all the places that it needed to be.

00:12:40.950 --> 00:12:42.510
<v Adam>So it was all client side.

00:12:43.510 --> 00:12:48.310
<v Adam>And, you know, since it was a crate, it was already kind of packaged up.

00:12:48.310 --> 00:12:53.230
<v Adam>If we could turn the crate into a Postgres extension using PGRX with,

00:12:53.430 --> 00:12:58.610
<v Adam>it was like, I don't know, a couple hours worth of work to take that crate and

00:12:58.610 --> 00:13:01.030
<v Adam>use PGRX and create an extension out of it.

00:13:01.390 --> 00:13:05.610
<v Adam>So then in that case, like there were all these shared objects for every function

00:13:05.610 --> 00:13:08.410
<v Adam>in Postgres and, you know, it was great.

00:13:08.450 --> 00:13:12.830
<v Adam>We had this extension and we could ship that extension around to different Postgres

00:13:12.830 --> 00:13:15.510
<v Adam>instances and share it with the world.

00:13:15.710 --> 00:13:19.670
<v Adam>And, you know, it was great. And then we started to run into those limitations

00:13:19.670 --> 00:13:25.310
<v Adam>where, you know, it was really how we were using PGRX made some things slow.

00:13:25.710 --> 00:13:32.950
<v Adam>So we actually was with the help of the community and we even got some help

00:13:32.950 --> 00:13:39.670
<v Adam>from folks at Supabase to rewrite it from PGRX Rust into PLPG SQL.

00:13:39.930 --> 00:13:42.610
<v Adam>And that was just for like the queue operations.

00:13:43.630 --> 00:13:46.950
<v Adam>And you know there's still you know we still want

00:13:46.950 --> 00:13:50.070
<v Adam>to use it use that extension at Tembo

00:13:50.070 --> 00:13:52.650
<v Adam>and all our stuff is in rust and you know

00:13:52.650 --> 00:13:56.390
<v Adam>tons of people on around the world are have their applications in rust and want

00:13:56.390 --> 00:14:04.710
<v Adam>to use this extension too so there's a the rust client library for it and and

00:14:04.710 --> 00:14:11.370
<v Adam>and that is to like give you the idiomatic rust experience for for using pgmq on on Postgres.

00:14:11.370 --> 00:14:17.010
<v Matthias>It's very nice pattern you go from plain vanilla SQL extract it into a separate

00:14:17.010 --> 00:14:23.090
<v Matthias>crate and make it a Postgres extension with ptrx and I really like these code

00:14:23.090 --> 00:14:28.370
<v Matthias>usability stories I hear that a lot and I really like that about Rust and being

00:14:28.370 --> 00:14:30.590
<v Matthias>able to kind of move things out into separate crate,

00:14:31.350 --> 00:14:34.830
<v Matthias>so that was the transition if I understood correctly but

00:14:34.830 --> 00:14:37.790
<v Matthias>then you run into serialization issues because when

00:14:37.790 --> 00:14:40.870
<v Matthias>you try to get batches of messages over the wire it's probably

00:14:40.870 --> 00:14:43.770
<v Matthias>an O of N operation to go through each and

00:14:43.770 --> 00:14:49.030
<v Matthias>every message and deserialize it with 30 and that can be slow and eventually

00:14:49.030 --> 00:14:54.030
<v Matthias>you want to go back a little bit and write parts of the more performant code

00:14:54.030 --> 00:15:01.030
<v Matthias>in maybe a lower level abstraction or maybe not use pgrx for that case right yeah.

00:15:01.030 --> 00:15:06.970
<v Adam>That's exactly it you know we talked to the the pgrx maintainers about what

00:15:06.970 --> 00:15:09.050
<v Adam>we were doing and they're like hey,

00:15:10.020 --> 00:15:13.100
<v Adam>You know, what you're doing doesn't really need to be in PGRX.

00:15:13.200 --> 00:15:14.740
<v Adam>You're just executing SQL statements.

00:15:15.060 --> 00:15:20.180
<v Adam>And yeah, there are some limitations right now with the maturity of the API

00:15:20.180 --> 00:15:24.060
<v Adam>that PGRX had around just executing SQL statements.

00:15:24.560 --> 00:15:32.520
<v Adam>So it was kind of a, yeah, we should, this should be SQL instead of, you know, in Rust.

00:15:33.580 --> 00:15:38.900
<v Matthias>But couldn't you have kept a hybrid approach? ptrx for most things SQL for the

00:15:38.900 --> 00:15:42.800
<v Matthias>performance critical parts what made you abandon ptrx entirely.

00:15:42.800 --> 00:15:48.700
<v Adam>We could have but there's some additional like gotchas with extensions in Postgres,

00:15:49.860 --> 00:15:53.480
<v Adam>and it's mostly around the like portability of

00:15:53.480 --> 00:15:56.760
<v Adam>extensions so if you

00:15:56.760 --> 00:16:00.620
<v Adam>want if you have an extension in Postgres extensions

00:16:00.620 --> 00:16:03.580
<v Adam>have like the objects that we kind of talked about earlier and

00:16:03.580 --> 00:16:06.780
<v Adam>they also have control files and migration

00:16:06.780 --> 00:16:09.740
<v Adam>files and all these other things and for

00:16:09.740 --> 00:16:12.500
<v Adam>it to be an extension those things have to be on like the

00:16:12.500 --> 00:16:15.840
<v Adam>same host as Postgres but like

00:16:15.840 --> 00:16:21.340
<v Adam>let's say you're running on rds or you're running in supabase or google cloud

00:16:21.340 --> 00:16:25.860
<v Adam>somewhere if it's an extension then you have to deal with all these additional

00:16:25.860 --> 00:16:31.100
<v Adam>things but if it's just sql then you can just take that sql with you and manage

00:16:31.100 --> 00:16:33.420
<v Adam>it however you want to from.

00:16:33.420 --> 00:16:39.180
<v Matthias>That perspective it makes total sense the pgmq client caught my attention because

00:16:39.180 --> 00:16:46.300
<v Matthias>unlike NATS or Kafka it's just Postgres so no need for new tech in your tech

00:16:46.300 --> 00:16:52.440
<v Matthias>stack but what impressed me the most was how rust like the api feels.

00:16:52.440 --> 00:16:53.820
<v Adam>How did.

00:16:53.820 --> 00:16:58.460
<v Matthias>You make a SQL-based queue feel so idiomatic to Rust developers?

00:16:59.880 --> 00:17:07.900
<v Adam>Yeah. So, you know, my opinion, and I didn't write all the client-side code

00:17:07.900 --> 00:17:11.560
<v Adam>here, so we had some help from the community who, you know, built some of these things.

00:17:11.720 --> 00:17:18.280
<v Adam>But the things that I care about in the client are establishing a connection pool to Postgres,

00:17:19.340 --> 00:17:26.000
<v Adam>And you can let the client do that for you, create a pool to Postgres, a connection pool.

00:17:26.460 --> 00:17:29.200
<v Adam>But maybe you're doing a bunch of other stuff in your application too.

00:17:29.380 --> 00:17:34.100
<v Adam>So we have the ability for you to create the pool yourself and like provide

00:17:34.100 --> 00:17:39.280
<v Adam>the pool to the client and let PGMQ's client use your pool.

00:17:39.280 --> 00:17:41.940
<v Adam>And so with in the

00:17:41.940 --> 00:17:45.080
<v Adam>client if with Rust there's like some generics typing that

00:17:45.080 --> 00:17:47.740
<v Adam>we could do with that to be like you know

00:17:47.740 --> 00:17:50.600
<v Adam>how we execute the the Postgres

00:17:50.600 --> 00:17:53.460
<v Adam>functions for the the queue you know

00:17:53.460 --> 00:17:56.860
<v Adam>we just take in certain as long

00:17:56.860 --> 00:18:00.880
<v Adam>as the objects have certain traits implemented we can execute that that sequel

00:18:00.880 --> 00:18:05.060
<v Adam>and you know like if if it's in a different language i don't you know i don't

00:18:05.060 --> 00:18:09.920
<v Adam>know if like it'd be way more complex but like since we can just make these

00:18:09.920 --> 00:18:14.980
<v Adam>assumptions like hey pass in this thing as long as it implements these traits we're good to go you.

00:18:14.980 --> 00:18:17.100
<v Matthias>Mentioned sqlx some people might

00:18:17.100 --> 00:18:21.640
<v Matthias>not be familiar with it what makes sqlx special for this kind of work.

00:18:21.640 --> 00:18:28.920
<v Adam>Yeah and in fact the pgmq client uses sqlx so yeah so sqlx i describe it as

00:18:28.920 --> 00:18:34.300
<v Adam>like everything that i care about from an ORM, but it's not an ORM.

00:18:34.700 --> 00:18:36.960
<v Adam>All the great things and none of the bad.

00:18:38.480 --> 00:18:41.860
<v Adam>So SQLX, you know, it does a lot of things and I don't use all of its features,

00:18:41.860 --> 00:18:48.560
<v Adam>but the thing I like the most is it gives me the compile time checks on the SQL that I write.

00:18:48.620 --> 00:18:54.160
<v Adam>So if I have some insert statement in my application and it's just written in raw SQL.

00:18:55.340 --> 00:19:01.640
<v Adam>I can rest assured that, you know, if I have some struct that I'm trying to, you know,

00:19:02.040 --> 00:19:08.180
<v Adam>serialize and insert into a row, if like the types on my struct are not in compliance

00:19:08.180 --> 00:19:11.320
<v Adam>with the table, SQLX will give me a compile error.

00:19:11.660 --> 00:19:13.960
<v Adam>Yeah, I love SQLX.

00:19:14.820 --> 00:19:22.460
<v Adam>It's like an ORM, but better. So what I like about ORMs is that they give me

00:19:22.460 --> 00:19:27.660
<v Adam>this really nice interface between my application and the types in the database.

00:19:27.880 --> 00:19:32.540
<v Adam>So if I have some struct, struct has attributes on it and those are typed,

00:19:33.020 --> 00:19:37.660
<v Adam>an ORM would be like, yeah, there's this connection between this object and

00:19:37.660 --> 00:19:39.360
<v Adam>the table or many tables.

00:19:40.000 --> 00:19:43.400
<v Adam>But I actually really like to write SQL and

00:19:43.400 --> 00:19:47.180
<v Adam>a lot of times I end up with SQL statements that don't

00:19:47.180 --> 00:19:53.400
<v Adam>really fit in ORM super well so I end up with this code base with like some

00:19:53.400 --> 00:20:00.460
<v Adam>SQL and some ORMs and it gets really messy but with SQLX I can just always write

00:20:00.460 --> 00:20:04.800
<v Adam>SQL and I love that as a developer because I enjoy writing SQL,

00:20:05.640 --> 00:20:11.820
<v Adam>but I still get these type checks because SQLX will look at my SQL statements

00:20:11.820 --> 00:20:13.900
<v Adam>and then check in the database,

00:20:14.740 --> 00:20:18.540
<v Adam>does the SQL statement and what you're trying to insert or what you're trying

00:20:18.540 --> 00:20:21.400
<v Adam>to read and deserialize into some struct.

00:20:22.590 --> 00:20:27.890
<v Adam>Does it is it going to work do the types match up and that that is huge it could

00:20:27.890 --> 00:20:32.490
<v Adam>be a little frustrating at first until you kind of get the hang of it but that

00:20:32.490 --> 00:20:37.610
<v Adam>piece right there is kind of what hooked me on sqlx now.

00:20:37.610 --> 00:20:42.030
<v Matthias>I would guess the other important part is async support because all of these

00:20:42.030 --> 00:20:47.990
<v Matthias>operations are io bound and you kind of don't want to block the client for concurrent queries,

00:20:48.730 --> 00:20:55.730
<v Matthias>what did you end up using for async and was it straightforward to integrate into the existing code.

00:20:55.730 --> 00:20:59.490
<v Adam>You know rewind way back to when

00:20:59.490 --> 00:21:02.470
<v Adam>we were like first building first started building in

00:21:02.470 --> 00:21:05.370
<v Adam>rust and it was like okay so for

00:21:05.370 --> 00:21:08.270
<v Adam>async we like we kind of have to handle this

00:21:08.270 --> 00:21:11.290
<v Adam>ourself and that was there's a

00:21:11.290 --> 00:21:15.130
<v Adam>ton of complexity around that that like frankly we

00:21:15.130 --> 00:21:18.750
<v Adam>like didn't want to spend our time like trying to implement and

00:21:18.750 --> 00:21:21.930
<v Adam>so then there's a few different runtimes out there and Tokio was

00:21:21.930 --> 00:21:26.730
<v Adam>like clearly the one that was the most most popular the most supported biggest

00:21:26.730 --> 00:21:32.250
<v Adam>community around so we just use that and you know to this day like when i'm

00:21:32.250 --> 00:21:37.650
<v Adam>building in rust i don't even i don't really think about which async runtime

00:21:37.650 --> 00:21:40.210
<v Adam>to go with i just use Tokio right.

00:21:40.210 --> 00:21:45.150
<v Matthias>But what about very complex things like supporting transactions and by transaction

00:21:45.150 --> 00:21:50.990
<v Matthias>i mean multiple statements that are executed atomically on Postgres if you wanted

00:21:50.990 --> 00:21:55.630
<v Matthias>to do that in an async way i would be afraid that it would get really complex.

00:21:55.630 --> 00:22:01.710
<v Adam>Yeah i i don't exactly know how you would if you had like two separate threads

00:22:01.710 --> 00:22:05.810
<v Adam>and you wanted to span some transaction across those i don't,

00:22:06.970 --> 00:22:09.830
<v Adam>I don't know. I haven't actually tried that. So I don't know how it would work.

00:22:11.210 --> 00:22:16.330
<v Adam>But kind of the way that we implemented the, I think earlier I mentioned like,

00:22:16.530 --> 00:22:19.110
<v Adam>yeah, this, you know, you can bring your own connection pool or really your

00:22:19.110 --> 00:22:24.390
<v Adam>own executor, as long as it has certain traits implemented on it, which is basically,

00:22:24.650 --> 00:22:26.710
<v Adam>you know, we inherit that from SQL X.

00:22:27.330 --> 00:22:33.090
<v Adam>But you can, you know, you can create, start a transaction, do whatever SQL you want.

00:22:35.570 --> 00:22:38.630
<v Adam>You could your like connection or your

00:22:38.630 --> 00:22:42.370
<v Adam>transaction object that you've created and if you pass that into pgmq

00:22:42.370 --> 00:22:45.270
<v Adam>then you could go do the pgmq things through the

00:22:45.270 --> 00:22:48.350
<v Adam>pgmq api and then finally commit your transaction

00:22:48.350 --> 00:22:53.010
<v Adam>however you however you you want to yourself you know so so like we lean on

00:22:53.010 --> 00:22:58.350
<v Adam>Tokio we lean on sqlx for that oh yeah the question about like to do that across

00:22:58.350 --> 00:23:03.250
<v Adam>threads i don't that could get tricky to like pass that connection across threads

00:23:03.250 --> 00:23:06.230
<v Adam>i don't know how you would do that it might be possible you.

00:23:06.230 --> 00:23:09.290
<v Matthias>Could probably share the connections across threads with

00:23:09.290 --> 00:23:15.970
<v Matthias>something like arc mutext and sqlx allows that kind of pattern but at the same

00:23:15.970 --> 00:23:20.630
<v Matthias>time why would you because Postgres already handles that with transactions and

00:23:20.630 --> 00:23:27.370
<v Matthias>that guarantees atomicity so maybe you just leave it as an implementation detail to the database.

00:23:27.370 --> 00:23:33.130
<v Adam>Yeah from Postgres side you know that's just it's the feature transactions you

00:23:33.130 --> 00:23:35.930
<v Adam>know we don't have to reinvent the wheel on that part some.

00:23:35.930 --> 00:23:39.310
<v Matthias>People might hear this and think why do i need pgmq

00:23:39.310 --> 00:23:42.110
<v Matthias>i can just write my own abstraction on top

00:23:42.110 --> 00:23:46.950
<v Matthias>of Postgres how hard can it be can only be a couple lines of code right and

00:23:46.950 --> 00:23:52.430
<v Matthias>we touched on transaction support and edge cases but maybe you can allude to

00:23:52.430 --> 00:23:57.750
<v Matthias>a few more things that you probably don't want to do yourself things that people

00:23:57.750 --> 00:24:00.730
<v Matthias>tend to forget when they build their own abstraction functions.

00:24:00.730 --> 00:24:03.430
<v Adam>Yeah i mean with all this stuff you know it's just

00:24:03.430 --> 00:24:08.870
<v Adam>code so you can you can always just go write write the code you know there's

00:24:08.870 --> 00:24:12.110
<v Adam>these operations that you need to do if you're going to build a queue you need

00:24:12.110 --> 00:24:16.150
<v Adam>to have a way to create a queue so how do you how do you define what a queue

00:24:16.150 --> 00:24:20.770
<v Adam>is you could you can think through that and make it what you want we have a

00:24:20.770 --> 00:24:22.590
<v Adam>function called create queue that,

00:24:23.340 --> 00:24:29.320
<v Adam>does it a way that we think is right and there's an api around it you know sending

00:24:29.320 --> 00:24:33.140
<v Adam>a message to a queue it's basically just an insert statement you know but that

00:24:33.140 --> 00:24:35.460
<v Adam>insert is handled for you,

00:24:36.120 --> 00:24:41.940
<v Adam>and if you're like if you're working in rust you have some message and that

00:24:41.940 --> 00:24:47.340
<v Adam>message becomes like an attribute like in a column on a table so you need to

00:24:47.340 --> 00:24:52.080
<v Adam>serialize that that message and get inserted into the table.

00:24:52.560 --> 00:24:55.500
<v Adam>So if you're using the Rust client, you know, there's some helpers there to

00:24:55.500 --> 00:24:57.900
<v Adam>do the serialization for you.

00:24:58.780 --> 00:25:03.260
<v Adam>So, you know, you save a lot of time by, you know, having this stuff like it's

00:25:03.260 --> 00:25:07.760
<v Adam>already built, it's already tested, you don't have to rewrite a bunch of things.

00:25:08.180 --> 00:25:12.520
<v Adam>But, you know, I'm sure there's cases out there where somebody like it makes

00:25:12.520 --> 00:25:13.640
<v Adam>sense for somebody to write their own.

00:25:14.340 --> 00:25:17.360
<v Matthias>Even if you asked me to design a message queue table schema,

00:25:17.680 --> 00:25:19.800
<v Matthias>I'd probably miss something important.

00:25:20.080 --> 00:25:25.980
<v Matthias>I'd add a timestamp, maybe a byte array for the message, though I'm not sure

00:25:25.980 --> 00:25:29.060
<v Matthias>if you support bytes or just UTF-8 strings.

00:25:29.280 --> 00:25:34.700
<v Matthias>Then some kind of locking or ownership fields to prevent reading the same message twice.

00:25:35.260 --> 00:25:37.480
<v Matthias>What actually goes into that table?

00:25:37.940 --> 00:25:42.780
<v Adam>Yeah, I mean, you're pretty close there. So there's a concept that,

00:25:43.000 --> 00:25:46.500
<v Adam>you know, we're inspired from the SQS, Simple

00:25:46.500 --> 00:25:49.840
<v Adam>Queue service from aws the visibility

00:25:49.840 --> 00:25:53.000
<v Adam>timeout which is kind of that that locking

00:25:53.000 --> 00:25:55.960
<v Adam>feature a little bit so when you read a message

00:25:55.960 --> 00:25:59.180
<v Adam>from a queue you have to specify how long

00:25:59.180 --> 00:26:05.060
<v Adam>do you want that that message to be unavailable to yourself or if you were to

00:26:05.060 --> 00:26:10.080
<v Adam>try to read again or to any other consumers of of that queue you know so let's

00:26:10.080 --> 00:26:16.540
<v Adam>say you have like 10 threads all reading from the same queue that the first

00:26:16.540 --> 00:26:17.700
<v Adam>time the message is read,

00:26:18.460 --> 00:26:22.760
<v Adam>you could set it to say, hey, make this message invisible for five minutes.

00:26:23.610 --> 00:26:26.770
<v Adam>And then during that five minutes, any of those other threads,

00:26:27.030 --> 00:26:31.630
<v Adam>none of them will be able to get that exact same message until that message,

00:26:32.010 --> 00:26:38.050
<v Adam>that time, the visibility time expires, and then that message becomes visible again.

00:26:38.050 --> 00:26:41.070
<v Adam>So that that piece

00:26:41.070 --> 00:26:44.930
<v Adam>i think it's super cool because without that

00:26:44.930 --> 00:26:48.230
<v Adam>you'd have to have some additional worker process that

00:26:48.230 --> 00:26:51.150
<v Adam>would watch the queue and look for messages that need

00:26:51.150 --> 00:26:56.490
<v Adam>to be you know need to be flipped back available again but in this way it's

00:26:56.490 --> 00:27:00.790
<v Adam>like completely stateless everything is just there's nothing that can really

00:27:00.790 --> 00:27:04.230
<v Adam>break there you know there's no process that crashes and all of a sudden you

00:27:04.230 --> 00:27:09.390
<v Adam>know messages get stuck they just automatically become visible again that's.

00:27:09.390 --> 00:27:14.910
<v Matthias>Clever it's sort of a log free algorithm because you just update the timestamp

00:27:14.910 --> 00:27:18.330
<v Matthias>when you read it and everyone knows the message is in flight.

00:27:18.330 --> 00:27:21.490
<v Adam>Yeah the the most complicated part

00:27:21.490 --> 00:27:25.170
<v Adam>of pgmq is the read statement uh it

00:27:25.170 --> 00:27:28.270
<v Adam>is a super complicated query it's a it's

00:27:28.270 --> 00:27:31.930
<v Adam>a for update statement so it it's

00:27:31.930 --> 00:27:35.130
<v Adam>reading but it's updating the the how

00:27:35.130 --> 00:27:38.530
<v Adam>many times it's been read so there's like a counter on that

00:27:38.530 --> 00:27:41.290
<v Adam>on that table that keeps track of how many times this

00:27:41.290 --> 00:27:45.410
<v Adam>message has been read there's that visibility time which you know it's some

00:27:45.410 --> 00:27:51.130
<v Adam>time that you know anything after that time that message can be read yeah so

00:27:51.130 --> 00:27:54.250
<v Adam>you know there's when you read the message you're actually it's an update statement

00:27:54.250 --> 00:28:00.590
<v Adam>to the table and it's a select for update which actually creates a lock on that table.

00:28:01.580 --> 00:28:06.940
<v Adam>And that makes it so that if two threads read at the exact same time,

00:28:07.360 --> 00:28:10.840
<v Adam>Postgres would figure out, hey, who gets this record first?

00:28:11.220 --> 00:28:18.220
<v Adam>So the for update just guarantees that only one person, one worker can get the message.

00:28:18.960 --> 00:28:21.820
<v Adam>But that does create a lock on the table.

00:28:21.980 --> 00:28:27.180
<v Adam>So we say for update, that creates a lock.

00:28:27.180 --> 00:28:29.840
<v Adam>And then there's a another clause in there

00:28:29.840 --> 00:28:33.680
<v Adam>skip locked which means any records

00:28:33.680 --> 00:28:36.960
<v Adam>that are locked skip over them and go to the next one and that

00:28:36.960 --> 00:28:40.680
<v Adam>makes it so that any other workers that are reading aren't sitting there waiting

00:28:40.680 --> 00:28:45.320
<v Adam>for that lock to leave they can skip it and go to the next message and so that

00:28:45.320 --> 00:28:49.800
<v Adam>lock is is pretty quick it's that lock only lasts for the duration of you know

00:28:49.800 --> 00:28:54.900
<v Adam>that that transaction which is pretty quick and like the long-term locking is

00:28:54.900 --> 00:28:56.520
<v Adam>handled by the visibility timeout.

00:28:56.920 --> 00:28:59.580
<v Matthias>You probably came up with that on the first try, right?

00:29:00.100 --> 00:29:05.060
<v Adam>No, I mean, there's a ton of resources out there.

00:29:05.200 --> 00:29:10.320
<v Adam>If you just search, you know, message queue, Postgres, there's a lot of people

00:29:10.320 --> 00:29:13.360
<v Adam>have written about for update and skipped locked. It's kind of the standard.

00:29:13.740 --> 00:29:17.820
<v Matthias>But still, you can make a lot of mistakes when implementing that.

00:29:18.140 --> 00:29:21.540
<v Matthias>And also, you might not know about this research in the first place.

00:29:21.800 --> 00:29:24.980
<v Matthias>You might just naively implement it and do it the wrong way.

00:29:25.160 --> 00:29:29.480
<v Adam>Yeah definitely like you know if you if you don't have the skipped lock on there

00:29:29.480 --> 00:29:32.180
<v Adam>you could be like you know you have 10 threads reading,

00:29:33.020 --> 00:29:37.380
<v Adam>one of them got a message and the other 10 are just sitting there that you like

00:29:37.380 --> 00:29:42.540
<v Adam>they would just sit there forever you know until that initial lock was released

00:29:42.540 --> 00:29:45.360
<v Adam>so yeah you could you could mess it up i.

00:29:45.360 --> 00:29:51.240
<v Matthias>Really like these implementation details that's what i live for because that

00:29:51.240 --> 00:29:55.020
<v Matthias>could really hamper your performance and in the worst case you don't realize

00:29:55.020 --> 00:30:00.280
<v Matthias>until you're on the load and when you really need it the most it will kind of fail on you.

00:30:00.280 --> 00:30:03.400
<v Adam>Yeah yeah that that's that's a

00:30:03.400 --> 00:30:07.080
<v Adam>good you know i'm glad you mentioned that because that's kind of a you

00:30:07.080 --> 00:30:10.020
<v Adam>know a reason why to use something that's

00:30:10.020 --> 00:30:13.440
<v Adam>been pre-packaged you know because you can you

00:30:13.440 --> 00:30:16.900
<v Adam>know there's a lot now today there's quite a few people using pgmq and

00:30:16.900 --> 00:30:19.620
<v Adam>you know when people run into

00:30:19.620 --> 00:30:22.440
<v Adam>issues like issue gets created on on the

00:30:22.440 --> 00:30:25.220
<v Adam>project and then you know we or somebody in

00:30:25.220 --> 00:30:27.900
<v Adam>the community resolves it but there's there's ways

00:30:27.900 --> 00:30:30.540
<v Adam>that you know try to build your own queue like there's ways it can

00:30:30.540 --> 00:30:36.160
<v Adam>go wrong so you know pgmq as a project has kind of started to be this like place

00:30:36.160 --> 00:30:41.580
<v Adam>that people go to learn and just use the code of you know when they want to

00:30:41.580 --> 00:30:45.500
<v Adam>do a queue on Postgres they can you know look at that project and either use

00:30:45.500 --> 00:30:49.460
<v Adam>the code or modify it and do it their own way that's.

00:30:49.460 --> 00:30:52.780
<v Matthias>Why i love open source in the first place because so much

00:30:52.780 --> 00:30:57.880
<v Matthias>is out there and you can look at the actual implementation and then decide if

00:30:57.880 --> 00:31:01.880
<v Matthias>you really want to go down that rabbit hole and implement it yourself because

00:31:01.880 --> 00:31:05.840
<v Matthias>just sending in a pull request will fix a problem for every instance out there

00:31:05.840 --> 00:31:10.320
<v Matthias>that has the latest version of course which is so so great.

00:31:10.320 --> 00:31:11.480
<v Adam>Yeah how.

00:31:11.480 --> 00:31:16.720
<v Matthias>Hard would it be to abstract that to other databases for example if there was

00:31:16.720 --> 00:31:19.680
<v Matthias>a customer that needed MariaDB support or MySQL.

00:31:19.680 --> 00:31:20.240
<v Adam>Support.

00:31:21.370 --> 00:31:23.770
<v Matthias>How hard would that be if you had to support that?

00:31:24.310 --> 00:31:29.390
<v Adam>I guess, you know, it would be, there's some SQL files in the PGMQ project,

00:31:29.390 --> 00:31:32.570
<v Adam>and it would be like, hey, for every one of these statements,

00:31:32.570 --> 00:31:35.330
<v Adam>what's the equivalent in the other database?

00:31:35.610 --> 00:31:39.730
<v Adam>And if you translated that, then it, you know, theoretically should work.

00:31:40.530 --> 00:31:43.590
<v Matthias>The cool thing is that you could use Rust's feature flex.

00:31:43.770 --> 00:31:47.110
<v Matthias>So the project builds against the database that it depends on.

00:31:47.110 --> 00:31:51.450
<v Matthias>I was wondering about Postgres Listen and Notify support.

00:31:51.670 --> 00:31:56.810
<v Matthias>Where do you draw the line between using native PopSup versus needing a full message queue?

00:31:57.010 --> 00:32:03.090
<v Adam>We've had some discussions of using those features from Postgres in PGM queue.

00:32:03.890 --> 00:32:11.030
<v Adam>But I don't think they are exactly, like you can just replace PGM queue with alert and notify.

00:32:12.510 --> 00:32:15.250
<v Adam>Mostly because you know if if you're going to send the

00:32:15.250 --> 00:32:18.670
<v Adam>message across one of those notification channels like

00:32:18.670 --> 00:32:21.370
<v Adam>if that message doesn't make it then it's kind

00:32:21.370 --> 00:32:26.610
<v Adam>of gone you know and you have no record of it you know in pgmq it's every message

00:32:26.610 --> 00:32:33.310
<v Adam>is a a row in a in a table and you can archive that row or just completely delete

00:32:33.310 --> 00:32:38.930
<v Adam>it if you want so you still have like a complete audit log of every message if you want it,

00:32:39.210 --> 00:32:44.150
<v Adam>if we're just to move it into one of those other channels and it's kind of gone,

00:32:44.350 --> 00:32:46.210
<v Adam>you know, unless we build something to handle it.

00:32:47.010 --> 00:32:53.550
<v Adam>But I do think it would be really useful to use it as a way to notify consumers

00:32:53.550 --> 00:32:55.910
<v Adam>that there are messages available in AQ.

00:32:57.140 --> 00:33:03.500
<v Adam>So right now, you know, if your application is reading messages from a queue,

00:33:03.680 --> 00:33:05.340
<v Adam>you have to pull the queue,

00:33:05.860 --> 00:33:09.980
<v Adam>you know, pull it once every second or have some back off logic,

00:33:10.200 --> 00:33:13.380
<v Adam>you know, pull it every second and then back off to 10 seconds or something.

00:33:13.380 --> 00:33:18.480
<v Adam>But it would be nice to be able to subscribe to the queue and have one of those

00:33:18.480 --> 00:33:22.180
<v Adam>notifications come to you and say, hey, there are messages now, now pull.

00:33:22.560 --> 00:33:25.480
<v Adam>And that could really help some

00:33:25.480 --> 00:33:29.920
<v Adam>efficiency if you have really low latency requirements and you can't wait.

00:33:30.140 --> 00:33:33.680
<v Adam>A one second pull interval is too long and you need to know right away.

00:33:33.680 --> 00:33:38.040
<v Adam>You know that uh and you don't want to just pull every you know 10 milliseconds

00:33:38.040 --> 00:33:43.260
<v Adam>or something less if we had something implemented on those features that could

00:33:43.260 --> 00:33:45.380
<v Adam>it could help a lot i think also.

00:33:45.380 --> 00:33:49.560
<v Matthias>On top of this you could use the Tokio stream abstraction and you could just

00:33:49.560 --> 00:33:55.600
<v Matthias>iterate over the messages and the futures would resolve as quickly as messages come in and it.

00:33:55.600 --> 00:33:56.160
<v Adam>Would all.

00:33:56.160 --> 00:34:02.000
<v Matthias>Kind of beautifully happen under the hood where you say well you get a future

00:34:02.000 --> 00:34:06.580
<v Matthias>ready and it contains the value and that is your message and i guess it would

00:34:06.580 --> 00:34:11.500
<v Matthias>be kind of nice to have that how far away are you from that reality.

00:34:11.500 --> 00:34:21.280
<v Adam>Well i think we would first need to to get the you know implement a way to like

00:34:21.280 --> 00:34:25.760
<v Adam>set up those channels on any given queue and i don't think that would be super difficult,

00:34:26.420 --> 00:34:29.620
<v Adam>there's probably a way that this could all be implemented purely

00:34:29.620 --> 00:34:34.080
<v Adam>on the the rust client side too so if

00:34:34.080 --> 00:34:37.080
<v Adam>i think of how like lib rd

00:34:37.080 --> 00:34:40.460
<v Adam>kafka reads messages from a

00:34:40.460 --> 00:34:43.880
<v Adam>kafka topic you know it pulls but it

00:34:43.880 --> 00:34:49.260
<v Adam>is pulling messages in in batch to the client and then the consumer of lib rd

00:34:49.260 --> 00:34:53.380
<v Adam>kafka you know iterates over those those messages and we could probably do something

00:34:53.380 --> 00:34:58.180
<v Adam>similar in the rust client you know where you you have some pull interval but

00:34:58.180 --> 00:35:02.740
<v Adam>we're handling that pull asynchronously and pulling messages back in batch.

00:35:03.540 --> 00:35:10.180
<v Adam>And then your Rust application could iterate over those messages asynchronously if it wanted to.

00:35:11.060 --> 00:35:16.760
<v Adam>Yeah, I always really liked how LibRD Kafka did it because as a user there,

00:35:16.940 --> 00:35:24.040
<v Adam>all that stuff is abstracted from you, but it's super efficient because it is pulling for you.

00:35:24.540 --> 00:35:28.280
<v Adam>And unless you really get into the weeds, you don't really know that it's pulling

00:35:28.280 --> 00:35:29.760
<v Adam>and it's pulling messages in batch.

00:35:30.040 --> 00:35:32.920
<v Adam>But we could probably do the exact same thing in the Rust client.

00:35:33.240 --> 00:35:35.340
<v Matthias>What's your biggest deployment at the moment?

00:35:35.840 --> 00:35:40.160
<v Adam>So I think I mentioned earlier this architecture with a control plane and a

00:35:40.160 --> 00:35:42.720
<v Adam>data plane. We have several data planes.

00:35:43.280 --> 00:35:50.880
<v Adam>There's a data plane in AWS, Azure, Google Cloud, and then some self-hosted ones as well.

00:35:51.180 --> 00:35:57.620
<v Adam>But all those public clouds, they all read messages from a single Postgres instances,

00:35:57.620 --> 00:35:59.880
<v Adam>and there's multiple queues within that.

00:36:00.300 --> 00:36:06.880
<v Adam>And I think at peak, I think there's like 10,000 per minute,

00:36:07.140 --> 00:36:08.940
<v Adam>maybe five minutes or something.

00:36:08.940 --> 00:36:14.900
<v Adam>But really like the scalability of that it's all comes down to what can Postgres

00:36:14.900 --> 00:36:20.300
<v Adam>handle how many inserts how many updates per second can Postgres handle is under

00:36:20.300 --> 00:36:24.300
<v Adam>because that's really all that pgmq is doing is inserts and updates well inserts

00:36:24.300 --> 00:36:26.300
<v Adam>updates and deletes okay.

00:36:26.300 --> 00:36:30.620
<v Matthias>So if you use that for a larger deployment it certainly won't be the bottleneck

00:36:30.620 --> 00:36:36.440
<v Matthias>in your application unless you are at web scale and then you always have bigger problems you.

00:36:36.440 --> 00:36:37.060
<v Adam>Have bigger fish.

00:36:37.060 --> 00:36:38.580
<v Matthias>To fry okay.

00:36:38.580 --> 00:36:40.280
<v Adam>Yeah what's.

00:36:40.280 --> 00:36:42.920
<v Matthias>Your setup for the production cluster right now.

00:36:42.920 --> 00:36:46.120
<v Adam>It's a dedicated Postgres cluster for

00:36:46.120 --> 00:36:49.260
<v Adam>the for the queue yeah and it's

00:36:49.260 --> 00:36:52.280
<v Adam>those specs on it are exactly right yeah and

00:36:52.280 --> 00:36:58.080
<v Adam>it it like never really gets over 20 cpu or memory memory utilization is kind

00:36:58.080 --> 00:37:02.140
<v Adam>of just hovers around 25 probably because that's what shared buffers are set

00:37:02.140 --> 00:37:08.880
<v Adam>to on that thing but i mean it is dedicated to the queue so you know it's kind

00:37:08.880 --> 00:37:12.000
<v Adam>of it is i think it is over provision well.

00:37:12.000 --> 00:37:16.760
<v Matthias>That's at least better than being on the provision what are some of the other

00:37:16.760 --> 00:37:19.620
<v Matthias>use cases of Postgres that you see in production a lot.

00:37:19.620 --> 00:37:24.640
<v Adam>Yeah you know full text search is it is built into Postgres so if if you're

00:37:24.640 --> 00:37:28.380
<v Adam>trying to do that like it's pretty easy to get up and running we also have a

00:37:28.380 --> 00:37:31.120
<v Adam>number of people using our VectorDB stack,

00:37:31.420 --> 00:37:36.440
<v Adam>which that's primarily PG Vector, which is another open source project out there

00:37:36.440 --> 00:37:41.500
<v Adam>that lets you kind of gives you the data type of a vector and then the operations

00:37:41.500 --> 00:37:43.420
<v Adam>to do similarity search on top of it.

00:37:43.600 --> 00:37:46.180
<v Adam>It is definitely the gold standard for,

00:37:47.170 --> 00:37:51.530
<v Adam>working with embeddings within Postgres. And then we built kind of a wrapper

00:37:51.530 --> 00:37:55.070
<v Adam>extension around it using PGRX and Rust.

00:37:55.310 --> 00:38:00.630
<v Adam>It lets you, say you have a table with some text in it and you want to generate

00:38:00.630 --> 00:38:02.910
<v Adam>embeddings from every row in that table.

00:38:03.550 --> 00:38:08.690
<v Adam>And be like, hey, I have this table. I want to use OpenAI or I want to use Anthropic

00:38:08.690 --> 00:38:12.130
<v Adam>or I have a self-hosted model. And I just want to get embeddings for this table.

00:38:12.450 --> 00:38:16.570
<v Adam>So our wrapper extension, it's called PG Vectorize.

00:38:17.270 --> 00:38:21.950
<v Adam>You just call a function on a table, tell it which model you want to use,

00:38:22.050 --> 00:38:28.610
<v Adam>and then in the background, it's using PGM-Q to look at the table and be like,

00:38:28.670 --> 00:38:30.950
<v Adam>hey, which columns do we need to get embeddings for?

00:38:31.350 --> 00:38:35.350
<v Adam>Pull that data, call the transformer model, get the embeddings,

00:38:35.570 --> 00:38:40.910
<v Adam>and insert those embeddings to another table or the same table. It's configurable.

00:38:41.250 --> 00:38:44.310
<v Adam>But yeah, it helps you with the orchestration there.

00:38:44.310 --> 00:38:48.390
<v Matthias>I like that you use your extensions in combination with other extensions and

00:38:48.390 --> 00:38:50.010
<v Matthias>calling into them. It's kind of cool.

00:38:50.950 --> 00:38:56.110
<v Matthias>How do you ensure safety between the boundary of Rust and Postgres?

00:38:56.550 --> 00:39:03.630
<v Adam>That Vectorize extension, it runs in a background process that Postgres manages.

00:39:04.610 --> 00:39:10.290
<v Adam>And we purposely don't use the SPI, that interface previously,

00:39:10.530 --> 00:39:11.710
<v Adam>because it's kind of unnecessary.

00:39:12.310 --> 00:39:16.810
<v Adam>So we just use SQLX. So Postgres spins up this background process that,

00:39:17.510 --> 00:39:22.130
<v Adam>reads from that queue of jobs that it needs to create embeddings for.

00:39:22.710 --> 00:39:27.570
<v Adam>And we just use SQLX for it. So it's kind of treated as just like a normal application internally.

00:39:27.590 --> 00:39:32.670
<v Adam>So there is no like interacting with FFI or anything going on there.

00:39:32.790 --> 00:39:37.470
<v Adam>We treat it as just after Postgres starts it, then it's like normal Rust application.

00:39:37.650 --> 00:39:43.010
<v Adam>Since we built it that way, we have the option on our cloud platform to run

00:39:43.010 --> 00:39:45.250
<v Adam>that background worker in a separate container.

00:39:45.290 --> 00:39:48.950
<v Adam>So instead as a Postgres background worker we

00:39:48.950 --> 00:39:53.490
<v Adam>just take that same rust binary and run it in a separate container next to Postgres

00:39:53.490 --> 00:39:59.190
<v Adam>so that's an even even better way to get around that limitation of like memory

00:39:59.190 --> 00:40:03.090
<v Adam>management well if it's not even on the same host as Postgres then you know

00:40:03.090 --> 00:40:05.730
<v Adam>you can scale that thing independently with.

00:40:05.730 --> 00:40:10.630
<v Matthias>Tembo you shipped your first real rust production application what took you

00:40:10.630 --> 00:40:13.630
<v Matthias>the longest to wrap your head around in rust.

00:40:13.630 --> 00:40:16.930
<v Adam>Yeah so today i love

00:40:16.930 --> 00:40:19.590
<v Adam>working in rust it it is a

00:40:19.590 --> 00:40:23.610
<v Adam>joy to build software using rust but

00:40:23.610 --> 00:40:27.730
<v Adam>when i was getting started there was this huge hump

00:40:27.730 --> 00:40:30.450
<v Adam>this hill to get over and it was

00:40:30.450 --> 00:40:33.190
<v Adam>super frustrating early on one of those

00:40:33.190 --> 00:40:36.570
<v Adam>things was just wrapping my head around

00:40:36.570 --> 00:40:39.310
<v Adam>like null like they're like you

00:40:39.310 --> 00:40:42.190
<v Adam>know there is no no you have like some and none and i

00:40:42.190 --> 00:40:45.930
<v Adam>looked at that and i'm like this is so weird it i

00:40:45.930 --> 00:40:51.350
<v Adam>came from python so you know you had a none type but there was no it was not

00:40:51.350 --> 00:40:58.570
<v Adam>an enum and until i like learned that it like it's an enum with like data is

00:40:58.570 --> 00:41:04.290
<v Adam>what you know how Rust handles null values so it's just like wrapper around your.

00:41:05.790 --> 00:41:07.710
<v Adam>Data, and is it null or is it not?

00:41:08.390 --> 00:41:11.650
<v Adam>That was, I don't know why, but it was really hard for me to grasp that.

00:41:12.380 --> 00:41:18.280
<v Adam>And in that same, like, direction, error handling, where it's like, okay, or error.

00:41:18.840 --> 00:41:22.340
<v Adam>I don't know why it just took me so long to wrap my head around.

00:41:23.020 --> 00:41:27.640
<v Adam>Like, yeah, you basically wrap your data, your, if you have a function,

00:41:27.820 --> 00:41:30.900
<v Adam>and it could error, well, you wrap it in this thing, a result,

00:41:31.120 --> 00:41:31.940
<v Adam>you know, and it's an enum.

00:41:32.520 --> 00:41:36.780
<v Adam>And it's the same for none, or like how Rust handles none.

00:41:38.160 --> 00:41:43.700
<v Adam>And I don't know I think I think like not today I love those things it just makes it so clean,

00:41:44.460 --> 00:41:47.800
<v Adam>like is there an error or not is it did

00:41:47.800 --> 00:41:50.600
<v Adam>it return something or was it there nothing you know

00:41:50.600 --> 00:41:55.840
<v Adam>it's super clean when I look at it today but when I was learning rust I'm like

00:41:55.840 --> 00:42:01.300
<v Adam>what is this this I don't even understand so I don't know what would have like

00:42:01.300 --> 00:42:05.440
<v Adam>made it easier for me to learn that early on but i remember it being super frustrating

00:42:05.440 --> 00:42:10.040
<v Adam>and then all of a sudden i got it like oh this is this is awesome i love it.

00:42:10.040 --> 00:42:14.540
<v Matthias>How long did it take you until you had a good grasp on the language.

00:42:15.060 --> 00:42:20.980
<v Adam>I'd say so when i when i started i was like learning full-time basically when

00:42:20.980 --> 00:42:26.480
<v Adam>i was learning Rust and i think it was two months you know maybe that's slow

00:42:26.480 --> 00:42:28.820
<v Adam>for some people like like all day,

00:42:29.620 --> 00:42:35.540
<v Adam>five days a week at least you know i was able to get up in like the building software,

00:42:36.200 --> 00:42:42.120
<v Adam>and still not quite understanding error handling and none you know handling

00:42:42.120 --> 00:42:50.440
<v Adam>of none types so but yeah i think it was two months for me of pain two months of pain i.

00:42:50.440 --> 00:42:51.820
<v Matthias>Really like that framing I mean.

00:42:52.000 --> 00:42:55.980
<v Adam>But the one thing, the one thing I loved like day one though,

00:42:56.220 --> 00:42:58.920
<v Adam>was having cargo, like, you know.

00:42:59.620 --> 00:43:05.240
<v Adam>I came from Python, and it was like, how do I get my environment set up to run my application?

00:43:05.600 --> 00:43:09.420
<v Adam>At the time, it was like, do I use virtual env? Do I use poetry?

00:43:09.940 --> 00:43:11.620
<v Adam>Do I just install everything with pip?

00:43:12.200 --> 00:43:16.700
<v Adam>You know, it's just, it's a mess. Like, there's some projects that are making

00:43:16.700 --> 00:43:18.060
<v Adam>it better in Python today.

00:43:18.320 --> 00:43:22.760
<v Adam>Like, UV is doing a really great job cleaning up that, solving that problem.

00:43:23.500 --> 00:43:26.640
<v Adam>But when I was learning Rust, it was like, hey, how do I run this?

00:43:26.740 --> 00:43:27.700
<v Adam>How do I run this application?

00:43:28.100 --> 00:43:31.180
<v Adam>Oh, Cargo Run. that's it you know how do

00:43:31.180 --> 00:43:37.160
<v Adam>i test it cargo test you know it it was just so intuitive you know how do i

00:43:37.160 --> 00:43:43.100
<v Adam>add a library from crates cargo ad you know and it was just all built into the

00:43:43.100 --> 00:43:46.980
<v Adam>tool chain and i didn't have to go search and read a bunch of blogs to figure

00:43:46.980 --> 00:43:49.080
<v Adam>out how to just get started,

00:43:49.660 --> 00:43:55.620
<v Adam>so that from day one was probably what kept me there through those two months of pain oh.

00:43:55.620 --> 00:43:58.520
<v Matthias>Yeah day one in python is strange because on one

00:43:58.520 --> 00:44:01.460
<v Matthias>side the language is great but on the other side the tooling is

00:44:01.460 --> 00:44:04.480
<v Matthias>not so great and you really get

00:44:04.480 --> 00:44:07.400
<v Matthias>those mixed feelings on that note

00:44:07.400 --> 00:44:12.900
<v Matthias>shout out to Charlie Marsh from Astral who was a guest in episode three of season

00:44:12.900 --> 00:44:18.020
<v Matthias>four you might want to check out that one so they do great work but the experience

00:44:18.020 --> 00:44:25.020
<v Matthias>before that was subpar the tooling was a bit all over the place and very difficult to use.

00:44:25.020 --> 00:44:35.240
<v Adam>Yeah i have like nightmares of so when i was working at shipped i all of a sudden

00:44:35.240 --> 00:44:41.380
<v Adam>they started issuing employees the macbooks with apple silicon so they all had arm architecture,

00:44:42.240 --> 00:44:48.540
<v Adam>And all of our, like not every project out there had Python wheels for ARM.

00:44:48.900 --> 00:44:53.920
<v Adam>So now like all of a sudden new employees coming in, like their local environments

00:44:53.920 --> 00:44:58.240
<v Adam>would just not work with our internal libraries because we weren't building wheels for ARM.

00:44:58.580 --> 00:45:05.020
<v Adam>Like we were using Kafka and librd or Python's Kafka library didn't have a

00:45:05.020 --> 00:45:06.960
<v Adam>wheel for ARM for the longest time.

00:45:07.140 --> 00:45:10.060
<v Adam>It was like, okay, we have to compile this stuff from source all of a sudden.

00:45:10.060 --> 00:45:15.300
<v Adam>And it was I had nightmares from that everything fell apart.

00:45:16.000 --> 00:45:21.020
<v Matthias>Well at least they did a great job on the migration from Python 2 to 3 that

00:45:21.020 --> 00:45:22.640
<v Matthias>was extremely painless of course.

00:45:22.780 --> 00:45:23.220
<v Adam>Just.

00:45:23.220 --> 00:45:24.200
<v Matthias>Kidding, it was a nightmare.

00:45:26.100 --> 00:45:30.640
<v Adam>Yeah the most obvious one there is just like trying to print something,

00:45:30.700 --> 00:45:33.600
<v Adam>it's like the print API changed.

00:45:33.620 --> 00:45:38.600
<v Matthias>You know yeah, thanks for bringing back that memory by the way,

00:45:38.720 --> 00:45:41.080
<v Matthias>do you have to touch a lot of python nowadays.

00:45:41.080 --> 00:45:45.520
<v Adam>Not not too much you know there's a lot of stuff in machine learning is still

00:45:45.520 --> 00:45:50.540
<v Adam>you know there's lots of libraries out there for machine learning things in python and,

00:45:51.240 --> 00:45:55.160
<v Adam>in the machine learning space python's mostly

00:45:55.160 --> 00:45:58.420
<v Adam>used as a wrapper around some c library

00:45:58.420 --> 00:46:01.580
<v Adam>that's super well optimized rust is i'd

00:46:01.580 --> 00:46:04.360
<v Adam>say catching up and like hugging face has

00:46:04.360 --> 00:46:07.100
<v Adam>built some rust libraries that kind of do the same thing

00:46:07.100 --> 00:46:10.120
<v Adam>as the python equivalents so but yeah

00:46:10.120 --> 00:46:13.360
<v Adam>most most of the time if i'm building a web server today

00:46:13.360 --> 00:46:19.540
<v Adam>i'll just grab actix and and run with it you know if i'm working with data and

00:46:19.540 --> 00:46:24.500
<v Adam>Postgres i'm going to grab sqlx 10 years ago it would have been like flask or

00:46:24.500 --> 00:46:30.340
<v Adam>fast api or maybe not 10 years ago for fast api but you know i have the equivalent

00:46:30.340 --> 00:46:31.480
<v Adam>of everything that I would,

00:46:31.720 --> 00:46:35.460
<v Adam>you know, five years ago would have gone to something in Python. I have,

00:46:36.120 --> 00:46:40.540
<v Adam>For me, I'd just pick Rust. It's just easier. It's a better experience.

00:46:41.520 --> 00:46:47.040
<v Matthias>And on that very positive note, what's your final message to the Rust community?

00:46:47.820 --> 00:46:53.280
<v Adam>Yeah. I mean, keep contributing to the project. The project doesn't live unless

00:46:53.280 --> 00:46:55.640
<v Adam>people are working on it and making it better.

00:46:57.000 --> 00:47:02.480
<v Adam>Today, there's really no such thing as finished software or a finished programming language.

00:47:02.480 --> 00:47:05.880
<v Adam>It's not like software is distributed in the

00:47:05.880 --> 00:47:08.880
<v Adam>mail where you burn stuff to a disc and mail it out and

00:47:08.880 --> 00:47:15.700
<v Adam>it just runs that way forever software is living breathing organisms now so

00:47:15.700 --> 00:47:22.600
<v Adam>things have to be constantly fed and for rust like if people stop contributing

00:47:22.600 --> 00:47:26.700
<v Adam>to rust and you know like the project won't go on.

00:47:26.920 --> 00:47:32.960
<v Adam>So I guess my biggest message would be thank you for building awesome stuff

00:47:32.960 --> 00:47:35.160
<v Adam>and please keep doing it.

00:47:36.240 --> 00:47:38.220
<v Matthias>Adam, thanks a lot for the interview.

00:47:38.960 --> 00:47:39.560
<v Adam>Thank you.

00:47:40.280 --> 00:47:43.940
<v Matthias>Rust in Production is a podcast by corrode. It is hosted by me,

00:47:44.240 --> 00:47:47.000
<v Matthias>Matthias Endler, and produced by Simon Brüggen.

00:47:47.200 --> 00:47:51.420
<v Matthias>For show notes, transcripts, and to learn more about how we can help your company

00:47:51.420 --> 00:47:54.340
<v Matthias>make the most of Rust, visit corrode.dev.

00:47:54.560 --> 00:47:56.940
<v Matthias>Thanks for listening to Rust in Production.