Skip to main content

Lakehouse Storage Wars

·638 words·3 mins

“You won’t find an unbiased comparison of lakehouse storage formats anywhere!”

I said this almost two years ago now after reading yet another spicy Linkedin post. I don’t think that’s really changed and I’m not about to fix it here, but I think it’s a moot point these days. The storage format you use might not matter at all anymore.

Focusing in on the two main contenders, both Apache Iceberg and Delta Lake have matured massively. Features that used to be exclusive to one are now present—or on the roadmap—for the other. Things aren’t as clear cut in this storage format war anymore. This isn’t another HD DVD vs BluRay.

Schema evolution, time travel, ACID compliance, data compaction, partition pruning, table versioning… the differences that once sparked endless mud slinging on LinkedIn are mostly smoothed over now.

The technical detail
#

The technical differences are more about how they do things.

  • Iceberg takes a clean, spec-first approach. Everything’s defined and it separates metadata from the data itself, building manifests for efficient reads. This is especially useful across engines like Spark, Trino, Presto, and Flink.

  • Delta Lake leans heavily into its transaction log, which makes it great for fine-grained operations like upserts and streaming use cases. It also powers cool features like Change Data Feed.

So yeah, the internals are certainly wired up different. But if you’re just building a data platform? You probably won’t notice. Or care.

Interoperability
#

Both open table formats are making efforts to make the decision on format irrelevant.

  • Apache XTable, although still in its early stages, aims to provide a unified interoperability layer between Delta Lake, Iceberg, and Hudi, allowing engines to read and write across formats without conversion.

  • With Delta Lake’s universal format, UniForm you could write data in Delta Lake but under the hood it also stores the metadata for both Iceberg and Hudi as well. It’s a genuine abstraction layer that starts to remove any practical reason to care about the format under the hood.

With several options to convert between them or essentially store all formats as UniForm does, you can see more evidence to care less about which format you use.

The platforms
#

In Databricks, Delta Lake is still the default, but Unity Catalog adds support for Iceberg tables as well as many other integrations. It’s another sign that Databricks aren’t trying to force a winner here, they’re building a platform where format choice is just a checkbox, not a commitment.

In Microsoft Fabric, the default is also Delta — no surprises there. But with the new Iceberg support, Fabric now makes it much easier to work with datasets outside of its native OneLake environment. That opens the door to more open architectures and multi-cloud flexibility, especially when you’re collaborating across teams or platforms that aren’t tied to Delta.

Even Snowflake, historically a closed system, now has strong support for Iceberg tables, and is pushing its Iceberg Tables as a Service offering, allowing external Iceberg tables to be queried just like native ones. Delta Lake support is also making its way in, with preview features and third-party connectors blurring the lines even further.

Who Wins?
#

More and more, we’re seeing tooling and platforms shift their focus away from format and toward interoperability. Catalogs, query engines, and governance layers are being designed to abstract this debate entirely.

So where does that leave us? Well… arguing about Delta vs Iceberg in 2025 feels a bit like arguing about whether your PDF was made in Microsoft Word or Google Docs. Sure, it matters, but not to the person reading it.

Feel free to pick apart tiny increments in performance of one format or minor feature enhancements but the decision of which format to store your data in should be driven by your platform and the tools your team uses.

Related

Data Sharing and secure Clean Rooms with Delta
·709 words·4 mins
Fresh from the DATA & AI Summit, I take a look at some key announcements including a secure sharing feature called Clean Rooms
Fabric Fast Facts Series 1
·268 words·2 mins
I’ve just finished publishing the first six videos in my Fabric Fast Facts YouTube Shorts series — quick, 60-second explainers designed to help you get to grips with Microsoft Fabric, one bite-sized topic at a time.
Data Questions Answered
·234 words·2 mins
Data Questions Answered is the first YouTube Shorts series I’ve launched aimed at breaking down some of the big (and sometimes controversial) questions in the data world.