InfluxData with Paul Dix
Paul Dix, CTO of InfluxDB, talks about the open-source time series database's development, the decision to use Go and Rust, challenges of managing high data volumes, performance improvements, future plans, and the value of hands-on learning.
2023-12-14 69 min
Description & Show Notes
For our very first episode, we welcome a special guest, Paul Dix, the CTO of InfluxData.
He starts by giving us an overview of InfluxDB, an open source time series database used by developers to track server and application data. He takes us back to the early days of InfluxDB and explains how it came into existence, starting with the challenges they faced with their initial SaaS application and how they made the decision to repurpose their infrastructure and create this open source database. Paul also sheds light on the popularity of the programming language Go, which had a significant influence on their decision to use it for their project.
He takes us through the journey of InfluxDB's development and the improvements that have been made over the years. He emphasizes the enhancements made in versions 0.11 and 1.0 to improve performance and query capabilities. Moreover, he shares their decision to explore using Rust for certain parts of the project and the positive impact it has had. Moving forward, the conversation delves into the challenges of managing high volumes of data in time series databases.
Paul talks about the solutions they implemented, such as using BoltDB and developing the time-structured merge tree storage engine. We then dive into the decision to rewrite InfluxDB in Rust and the benefits it offers. He explains the improved performance, concurrency, and error handling that Rust brings to the table. Paul goes on to discuss the development process and how the engineering team has embraced Rust across their projects.
As the conversation progresses, we touch on the performance improvements in InfluxDB 3 and the future plans for the database. Paul shares their vision of incorporating additional features and integrating with other tools and languages. He also mentions InfluxDB's involvement in open-source projects like Apache Aero Rust and Data Fusion, highlighting their ambition to extend beyond metric data. Paul concludes the conversation by discussing the standards and libraries in analytics, the role of Apache Iceberg, and the collaboration among data and analytics companies. He provides advice for getting started with Rust and InfluxDB, urging listeners to engage in hands-on projects and learn from books and online documentation.
Thank you, Paul, for sharing your insights and expertise.
He starts by giving us an overview of InfluxDB, an open source time series database used by developers to track server and application data. He takes us back to the early days of InfluxDB and explains how it came into existence, starting with the challenges they faced with their initial SaaS application and how they made the decision to repurpose their infrastructure and create this open source database. Paul also sheds light on the popularity of the programming language Go, which had a significant influence on their decision to use it for their project.
He takes us through the journey of InfluxDB's development and the improvements that have been made over the years. He emphasizes the enhancements made in versions 0.11 and 1.0 to improve performance and query capabilities. Moreover, he shares their decision to explore using Rust for certain parts of the project and the positive impact it has had. Moving forward, the conversation delves into the challenges of managing high volumes of data in time series databases.
Paul talks about the solutions they implemented, such as using BoltDB and developing the time-structured merge tree storage engine. We then dive into the decision to rewrite InfluxDB in Rust and the benefits it offers. He explains the improved performance, concurrency, and error handling that Rust brings to the table. Paul goes on to discuss the development process and how the engineering team has embraced Rust across their projects.
As the conversation progresses, we touch on the performance improvements in InfluxDB 3 and the future plans for the database. Paul shares their vision of incorporating additional features and integrating with other tools and languages. He also mentions InfluxDB's involvement in open-source projects like Apache Aero Rust and Data Fusion, highlighting their ambition to extend beyond metric data. Paul concludes the conversation by discussing the standards and libraries in analytics, the role of Apache Iceberg, and the collaboration among data and analytics companies. He provides advice for getting started with Rust and InfluxDB, urging listeners to engage in hands-on projects and learn from books and online documentation.
Thank you, Paul, for sharing your insights and expertise.
About InfluxData
InfluxData is the creator of InfluxDB, the leading open source time series database. They offer a cloud service, InfluxDB Cloud, and a commercial on-premise product, InfluxDB Enterprise (https://www.influxdata.com/products/influxdb-enterprise/).
About Paul Dix
Paul Dix is the founder and CTO of InfluxData (https://www.influxdata.com/). He has helped build software for startups, large companies and organizations like Microsoft, Google, McAfee, Thomson Reuters, and Air Force Space Command. He is the series editor for Addison Wesley's Data & Analytics book and video series (https://www.informit.com/imprint/series_detail.aspx?ser=4255387). In 2010 Paul wrote the book "Service Oriented Design with Ruby and Rails" (https://www.oreilly.com/library/view/service-oriented-design-with/9780321700124/) for Addison Wesley's Professional Ruby Series. In 2009 he started the NYC Machine Learning Meetup (https://www.meetup.com/nyc-machine-learning/), which now has over 13,000 members. Paul holds a degree in computer science from Columbia University. You can find Paul on Twitter (https://twitter.com/pauldix) and GitHub (https://github.com/pauldix).
Links
- InfluxData: https://www.influxdata.com/
- Careers at InfluxData: https://www.influxdata.com/careers/
- Blog post: Meet the Founders Who Rewrote in Rust: https://www.influxdata.com/blog/meet-founders-who-rewrote-in-rust/
- Reddit: Details and discussion on the Rust rewrite: https://www.reddit.com/r/rust/comments/16v13l5/influxdb_officially_made_the_switch_from_go_rust/
- Blog post: The Plan for InfluxDB 3.0 Open Source: https://www.influxdata.com/blog/the-plan-for-influxdb-3-0-open-source/
InfluxData is the creator of InfluxDB, the leading open source time series database. They offer a cloud service, InfluxDB Cloud, and a commercial on-premise product, InfluxDB Enterprise (https://www.influxdata.com/products/influxdb-enterprise/).
About Paul Dix
Paul Dix is the founder and CTO of InfluxData (https://www.influxdata.com/). He has helped build software for startups, large companies and organizations like Microsoft, Google, McAfee, Thomson Reuters, and Air Force Space Command. He is the series editor for Addison Wesley's Data & Analytics book and video series (https://www.informit.com/imprint/series_detail.aspx?ser=4255387). In 2010 Paul wrote the book "Service Oriented Design with Ruby and Rails" (https://www.oreilly.com/library/view/service-oriented-design-with/9780321700124/) for Addison Wesley's Professional Ruby Series. In 2009 he started the NYC Machine Learning Meetup (https://www.meetup.com/nyc-machine-learning/), which now has over 13,000 members. Paul holds a degree in computer science from Columbia University. You can find Paul on Twitter (https://twitter.com/pauldix) and GitHub (https://github.com/pauldix).
Links
- InfluxData: https://www.influxdata.com/
- Careers at InfluxData: https://www.influxdata.com/careers/
- Blog post: Meet the Founders Who Rewrote in Rust: https://www.influxdata.com/blog/meet-founders-who-rewrote-in-rust/
- Reddit: Details and discussion on the Rust rewrite: https://www.reddit.com/r/rust/comments/16v13l5/influxdb_officially_made_the_switch_from_go_rust/
- Blog post: The Plan for InfluxDB 3.0 Open Source: https://www.influxdata.com/blog/the-plan-for-influxdb-3-0-open-source/
Transcript
Matthias
00:00:23
Paul
00:00:29
Matthias
00:01:19
Paul
00:01:35
Matthias
00:06:29
Paul
00:06:43
Matthias
00:07:30
Paul
00:07:57
Matthias
00:09:44
Paul
00:11:10
Matthias
00:19:33
Paul
00:20:06
Matthias
00:21:56
Paul
00:22:07
Matthias
00:22:45
Paul
00:22:54
Matthias
00:28:22
Paul
00:28:55
Matthias
00:31:27
Paul
00:31:55
Matthias
00:38:59
Paul
00:39:39
Matthias
00:41:51
Paul
00:42:25
Matthias
00:45:03
Paul
00:45:44
Matthias
00:48:46
Paul
00:49:17
Matthias
00:50:40
Paul
00:51:14
Matthias
00:52:00
Paul
00:52:12
Matthias
00:55:15
Paul
00:55:40
Matthias
00:57:01
Paul
00:57:18
Matthias
00:57:31
Paul
00:58:02
Matthias
01:01:59
Paul
01:02:13
Matthias
01:02:45
Paul
01:03:31
Matthias
01:05:46
Paul
01:06:12
Matthias
01:07:20
Paul
01:07:34
Matthias
01:08:03
Paul
01:08:14