TL;DR

Streambed is a new tool that streams PostgreSQL WAL changes directly to Iceberg tables stored on S3, enabling analytical queries without ETL or Spark. It also supports the Postgres wire protocol for easy querying.

Streambed, a newly introduced tool, streams PostgreSQL write-ahead log (WAL) changes directly to Iceberg tables stored on S3 and supports querying via the Postgres wire protocol, without requiring ETL or Spark. This development aims to simplify real-time analytics and offload workload from production databases.

Streambed connects to PostgreSQL as a logical replication subscriber, decoding WAL messages for inserts, updates, and deletes. It buffers these changes and periodically writes them as Parquet files to S3, simultaneously updating Iceberg metadata. The system supports copy-on-write merging for updates and deletes, ensuring data consistency. Additionally, it includes a built-in query server that exposes Iceberg tables over the Postgres wire protocol, allowing users to connect with standard Postgres clients like psql.

The tool can run alongside existing Postgres instances, enabling seamless offloading of analytical workloads without altering application code. It also offers commands for resynchronization and cleanup, facilitating maintenance and data management. Streambed requires Go 1.22+ and CGO, and can be deployed locally using Docker for testing.

Why It Matters

This development is significant because it offers a straightforward way to perform real-time analytics on production Postgres data without complex ETL pipelines or Spark clusters. By streaming WAL changes directly into Iceberg tables on S3 and supporting standard Postgres querying, it reduces infrastructure complexity and latency, making analytics more accessible and cost-effective for organizations.

Amazon

PostgreSQL WAL streaming tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Traditional approaches to analytics often involve ETL processes, data warehouses, or Spark-based data lakes, which can introduce delays and complexity. Recent innovations have focused on simplifying data pipelines, with tools like Delta Lake and Iceberg gaining popularity. Streambed builds on this trend by providing a streaming CDC engine that integrates directly with Postgres, one of the most widely used relational databases, and leverages Iceberg’s capabilities for scalable, open table formats on cloud storage.

Its announcement follows ongoing efforts in the data engineering community to enable real-time analytics while minimizing infrastructure overhead. The approach aligns with recent industry movements toward streaming data pipelines and serverless querying solutions.

“Streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata, enabling real-time analytics without ETL or Spark.”

— Viggy28, creator of Streambed

“Supporting the Postgres wire protocol means you can query Iceberg tables directly with psql, simplifying access for existing workflows.”

— Hacker News user

Iceberg 69227 ARC 6-Foot Rectangular Table, 36" x 72", Graphite/Silver Leg

Iceberg 69227 ARC 6-Foot Rectangular Table, 36" x 72", Graphite/Silver Leg

Versatile for open plan environments

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how well Streambed performs at scale or how it handles complex schema changes. Details about its production readiness, stability, and user adoption are still emerging, as the project is relatively new and primarily shared on Hacker News.

Amazon

Postgres wire protocol client

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include broader testing in production environments, performance benchmarking, and potential integration with existing data workflows. The developer community may also contribute features like support for additional query engines or enhanced schema evolution handling.

Streaming Data: Understanding the real-time pipeline

Streaming Data: Understanding the real-time pipeline

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does Streambed compare to traditional ETL pipelines?

Streambed offers real-time streaming of WAL changes directly into Iceberg on S3, eliminating the need for batch ETL processes and reducing latency.

Can I query the Iceberg tables with my existing Postgres tools?

Yes, because Streambed supports the Postgres wire protocol, you can connect with psql or other Postgres clients to query the data directly.

What are the prerequisites for deploying Streambed?

It requires Go 1.22+ and CGO, and can be run locally using Docker for testing purposes.

Is Streambed suitable for high-volume production environments?

While promising, performance and stability at scale are still being evaluated; further testing is needed before confirming suitability for large-scale production use.

Does Streambed support schema changes in Postgres?

Schema change handling details are not fully documented; it may depend on the specific implementation and ongoing development.

Source: Hacker News

You May Also Like

Alphabet announces $80B equity capital raise to expand AI infra and compute

Alphabet plans an $80 billion equity capital raise to expand its AI infrastructure and computing capacity, aiming to accelerate AI development.

Incident with Pull Requests, Issues, Git Operations and API Requests

A recent outage affected GitHub’s Git operations, issues, pull requests, and API performance, now resolved after several hours of investigation.

SpaceX Starship V3’s first test flight was largely successful

SpaceX’s Starship V3 completed its first test flight, despite engine issues, marking a significant step toward future lunar and Mars missions.

Stealing from Biologists to Compile Haskell Faster

A Haskell compiler optimization bug led developers to discover a biologist’s RNA folding algorithm, improving code compilation speed significantly.