Stanislav Kozlovski’s Post

Databricks announces Zerobus - a Kafka-like message bus for writing data directly to your Lakehouse in the Unity Table Format. Zero copy. High throughput. Near real-time latency. Unlike Kafka, this is useful when your only destination is the lake house. I posted a while ago about the idea of open table formats reducing the need for Kafka - if S3 and table formats become the default place to store data, you omit a lot of read fanout jobs that simply copied the data to another system for use. Just store it in one place. Interesting development. It seems like every company is releasing their Kafka-like niched down alternative. (see Cloudflare Pipelines)

  • No alternative text description for this image
Jan Siekierski

Data Streaming Consultant

2mo

Is it time for Open Stream Formats now? S2 is an attempt, but it's an api - and also proprietary. Northguard's Range Splitting is a really cool idea of how we can provide real elasticity. Right now we have the Kafka protocol that's open and extremely widely adopted, but that's a behavioral specification, not structural. It's also coming from a different time and designed with self-hosting in LinkedIn's data centers in mind. And while we see cloud-native ideas on how to reimplement it, Kafka protocol seems a bit outdated to me? I'm curious if we'll see an innovation in this direction. Do we even need one? Maybe the available solutions are good enough and the effort to improve them would be better spent elsewhere?

Dan Forsberg

CEO & Founder @BoilingData | Ph.D. | Author

2mo

You can do this with BoilStream into DuckLake with FlightRPC directly from DuckDB with plain SQL inserts. We have also derived topics (materialised views) as well as Postgres interface for realtime analytics.

Jose Manuel Cristóbal

Director Platform Engineering - Fast Data Platform at adidas

2mo

I don’t know. Maybe collaboration to simplify data ingestion from existing technologies and protocols (specifically OSS ) is a better approach than yet-another-ingestion-technology. Let’s see how it evolves.

Sarwar Bhuiyan

Technical Architect | Software Engineering, Data and Stream processing, Cloud architectures | Product Management | Technology Consulting | Field Advisory

2mo

Ken Chen Time for Nativelog to be unleashed with an API.

David Reger

Cloud Data Engineer @ msg

2mo

I think they should be more specific about near real-time (everyone claims to be near real-time). Other than that I do not know what actually the definition of zero copy means. Somehow they need to copy the data into the system as they are not transforming it from the source (it‘s ELT not ETL). I feel they might want to go back to ETL with shifting everything to the left.

Julien Laurenceau

Senior Solutions Architect | Big Data | AI | Distributed Systems | Cloud | Support | Audit

2mo

Link to the presentation : https://www.youtube.com/watch?v=wrH5wWmFT94 I'd be really curious to know about which delivery semantics they support (if any).

🐿️ Ben Gamble🧑🏾🦯

Technology Sommelier, AI Whisperer

2mo

𝖹̶𝖾̶𝗋̶𝗈̶𝗆̶𝗊̶ ̶𝗁̶𝖺̶𝗌̶ ̶𝖻̶𝖾̶𝖾̶𝗇̶ ̶𝗍̶𝗁̶𝖾̶ ̶𝗌̶𝗅̶𝖾̶𝖾̶𝗉̶𝖾̶𝗋̶ ̶𝗉̶𝗋̶𝗈̶𝗍̶𝗈̶𝖼̶𝗈̶𝗅̶ ̶𝖻̶𝖾̶𝗁̶𝗂̶𝗇̶𝖽̶ ̶𝗌̶𝗈̶ ̶𝗆̶𝗎̶𝖼̶𝗁̶ ̶𝖿̶𝗈̶𝗋̶ ̶𝖺̶𝗀̶𝖾̶𝗌̶,̶ ̶𝗂̶𝗇̶𝖼̶𝗅̶𝗎̶𝖽̶𝗂̶𝗇̶𝗀̶ ̶𝗃̶𝗎̶𝗒̶𝗉̶𝗍̶𝖾̶𝗋̶ ̶𝗇̶𝗈̶𝗍̶𝖾̶𝖻̶𝗈̶𝗈̶𝗄̶𝗌̶.̶ ̶𝖵̶𝖾̶𝗋̶𝗒̶ ̶𝗌̶𝗆̶𝖺̶𝗋̶𝗍̶ ̶𝖼̶𝗁̶𝗈̶𝗂̶𝖼̶𝖾̶ ̶𝖺̶𝗌̶ ̶𝗂̶𝗍̶ ̶𝖺̶𝗅̶𝗋̶𝖾̶𝖺̶𝖽̶𝗒̶ ̶𝗁̶𝖺̶𝗌̶ ̶𝖽̶𝗋̶𝗂̶𝗏̶𝖾̶𝗋̶𝗌̶ ̶𝗂̶𝗇̶ ̶𝗅̶𝗈̶𝗍̶𝗌̶ ̶𝗈̶𝖿̶ ̶𝗅̶𝖺̶𝗇̶𝗀̶𝗎̶𝖺̶𝗀̶𝖾̶𝗌̶ I mixed up an OSS lib I was looking at and this, I have no insider info

Like
Reply
Emeric Tabakhoff

Remote Database performance and HA expert for Postgres & MySQL | I help your company scale to thousands of users 💪 keep existing users ⚔️ & protect their data 🛡️ #Postgres #PostgreSQL #MariaDB #MySQL #DBA #Freelance

2mo

this sounds like a game changer for data management.

Like
Reply
Kien Truong

Smart and get things done

2mo

This is similar to BigQuery PubSub subscription, push data in a pipe and it lands in a table

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories