Rust in Production

Matthias Endler

Since 12/2023 33 episodes

Arroyo with Micah Wylde

Rust in Production episode explores Arroyo, a real-time data processing engine built in Rust. Micah Wylde from Arroyo shares insights on benefits, challenges, and future potential. Visit Arroyo's website for more.

2024-01-25 57 min

Description & Show Notes

In this episode, we have Micah Wylde from Arroyo as our guest. Micah introduces us to Arroyo, a real-time data processing engine that simplifies stream processing for data engineers using Rust. They explain how Arroyo enables users to write SQL queries with Rust user-defined functions on top of streaming data, highlighting the advantages of real-time data processing and discussing the challenges posed by competitors like Apache Flink. Moving on, we dive into the use of Rust in Arroyo and its benefits in terms of performance and memory safety. We explore the complementarity of workflow engines and stream processors and examine Arroyo's approach to real-time SQL and its compatibility with Postgres. Micah delves into memory and lifetime concerns and elaborates on how Arroyo manages them in its storage layer. Shifting gears, we explore the use of the Tokyo framework in the Arroyo system and how it has enhanced speed and efficiency. Micah shares insights into the challenges and advantages of utilizing Rust, drawing from their experiences with Arroyo projects. Looking ahead, we discuss the future of the Rust ecosystem, addressing the current state of the Rust core and standard library, as well as the challenges of interacting with other languages using FFI or dynamically loading code. We touch upon Rust's limitations regarding a stable ABI and explore potential solutions like WebAssembly. We also touch upon industry perceptions of Rust, investor perspectives, and the hiring process for Rust engineers. The conversation takes us through the crates used in the Arroyo system, our wishlist for Rust ecosystem improvements, and the cost-conscious nature of companies that make Rust an attractive choice in the current macroeconomic environment. As we wrap up, we discuss the challenges Rust faces in competing with slower Java systems and ponder the potential for new languages to disrupt the trend in the future. We touch upon efficiency challenges in application software and the potential for a new language to emerge in this space. We delve into the increasing interest in using Rust in data science and the promising prospects of combining Rust with higher-level languages. Finally, we discuss the importance of fostering a welcoming and drama-free Rust community. I would like to thank Micah for joining us today and sharing their insights. To find more resources related to today's discussion, please refer to the show notes. Stay tuned for our next episode, and thank you for listening!

About Arroyo
Arroyo was founded in 2022 by Micah Wylde and is based in San Francisco, CA. It is backed by Y Combinator (https://www.ycombinator.com/) (YC W23). The companies' mission is to accelerate the transition from batch-processing to a streaming-first world.

About Micah Wylde
Micah was previously tech lead for streaming compute at Splunk and Lyft, where he built real-time data infra powering Lyft's dynamic pricing, ETA, and safety features. He spends his time rock climbing, playing music, and bringing real-time data to companies that can't hire a streaming infra team.

Proudly Supported by CodeCrafters

CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch.

Start for free today and enjoy 40% off any paid plan by using this link.

Tools and Services Mentioned
- Apache Flink: https://flink.apache.org/
- Tokio Discord: https://discord.gg/tokio
- Clippy: https://github.com/rust-lang/rust-clippy
- Zero to Production in Rust by Luca Palmieri: https://www.zero2prod.com/
- Apache DataFusion: https://github.com/apache/arrow-datafusion
- Axum web framework: https://github.com/tokio-rs/axum
- `sqlx` crate: https://github.com/launchbadge/sqlx
- `log` crate: https://github.com/rust-lang/log
- `tokio tracing` crate: https://github.com/tokio-rs/tracing
- wasmtime - A standalone runtime for WebAssembly: https://github.com/bytecodealliance/wasmtime

References To Other Episodes
- Rust in Production Season 1 Episode 1: InfluxData: https://corrode.dev/podcast/s01e01-influxdata

Official Links
- Arroyo Homepage: https://www.arroyo.dev/
- Arroyo Streaming Engine: https://github.com/ArroyoSystems/arroyo
- Blog Post: Rust Is The Best Language For Data Infra: https://www.arroyo.dev/blog/rust-for-data-infra
- Micah Wylde on LinkedIn: https://www.linkedin.com/in/wylde/
- Micah Wylde on GitHub: https://github.com/mwylde
- Micah Wylde's Personal Homepage: https://www.micahw.com/

Transcript

This is Rust in Production, a podcast about companies who use Rust to shape the future of infrastructure. My name is Matthias Endler from corrode, and today we are talking to Micah Wylde from Arroyo about how they simplified stream processing for data engineers with Rust. Micah, welcome to the show. Can you tell us a few words about yourself and Arroyo, the company you founded?

Micah