Gunjan Sharma

Engineering

We Stopped Updating Data, and Just Started Hoarding It: The LSM Tree

For 40 years, database engineers believed B-Trees were the undisputed kings of storage.

The logic seemed airtight: Find a record. Overwrite it in place. Keep a balanced tree so lookups are O(log N). Hide the slow random disk I/O behind a massive RAM cache.

Then data grew too fast for RAM, and SSDs changed the physics of hardware.

A new generation of engineers proved the "overwrite" assumption wrong.

The trick: Stop overwriting entirely. Just append every change to a sequential log. When the log gets too big, merge and compact the files in the background.

Result: 10x to 100x higher write throughput. Zero random I/O.

It's called the Log-Structured Merge (LSM) Tree. The engine powering modern giants: Cassandra, RocksDB, and LevelDB.

We stopped updating data, and just started hoarding it.