Reading

Papershelf

Research papers I've read and found worth sharing. Each entry includes my one-line takeaway and why I think it matters. I'm drawn to distributed systems, consensus, security, and applied cryptography.

Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform ↗

Vitalik ButerinEthereum Whitepaper 2014Jun 2026

The Ethereum whitepaper introduced the concept of a Turing-complete blockchain with smart contracts and a programmable decentralized state machine (EVM). It spawned the entire DeFi, NFT, and Web3 ecosystem. Critical reading for blockchain engineers building on XRPL, EVM-compatible chains, or any smart contract platform.

Large-scale cluster management at Google with Borg ↗

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu et al., GoogleEuroSys 2015, GoogleJun 2026

Borg is the container orchestration system that ran Google's workloads for over a decade and directly inspired Kubernetes. It introduced key concepts like resource quotas, priority scheduling, task grouping, and health checking at planetary scale. Understanding Borg is the foundation for understanding modern cloud-native infrastructure.

Cassandra: A Decentralized Structured Storage System ↗

Avinash Lakshman, Prashant Malik, FacebookLADIS Workshop 2009, FacebookJun 2026

Cassandra combines the best of Dynamo (partitioning and replication) with Bigtable's data model into a highly available, leaderless NoSQL database. Now powering Apple, Netflix, and Discord at massive scale, Cassandra's design choices around tunable consistency are central to any NoSQL architecture discussion.

The Log: What every software engineer should know about real-time data's unifying abstraction ↗

Jay Kreps, LinkedInLinkedIn Engineering Blog 2013Jun 2026

Jay Kreps' seminal essay on the log as the unifying abstraction for distributed systems — covering databases, stream processing, and data integration. It explains why the append-only log is the fundamental primitive behind databases, Kafka, CDC, and event sourcing. Required reading for any data/backend engineer.

The Anatomy of a Large-Scale Hypertextual Web Search Engine ↗

Sergey Brin, Lawrence PageWWW Conference 1998, StanfordJun 2026

The original Google paper. Introduced PageRank and the architecture of a large-scale web search engine including crawling, indexing, and ranking. It demonstrates how graph algorithms and large-scale distributed systems combine to extract knowledge from the web. The paper that started one of the most impactful technology companies in history.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications ↗

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari BalakrishnanSIGCOMM 2001, MITJun 2026

Chord introduced the consistent hashing ring for distributed hash tables — the technique now foundational to load balancing, distributed caches (Memcached, Redis Cluster), and sharding strategies. Understanding Chord's O(log N) lookup via finger tables is key to designing scalable distributed systems.

A Relational Model of Data for Large Shared Data Banks ↗

E.F. Codd, IBMCommunications of the ACM 1970, IBMJun 2026

The paper that invented relational databases. Codd's 1970 model laid the mathematical foundation for SQL, RDBMS, and 50+ years of database systems. Every engineer who has written a SQL query is building on this work. A Turing Award-winning contribution that fundamentally changed how we organize and query data.

Bitcoin: A Peer-to-Peer Electronic Cash System ↗

Satoshi NakamotoSelf-published 2008Jun 2026

The whitepaper that launched the blockchain era. Introduced proof-of-work consensus, the UTXO model, and a trustless peer-to-peer payment system. Whether building blockchain systems or not, every engineer working in fintech or distributed systems should understand the elegant engineering trade-offs in this 9-page paper.

Zanzibar: Google's Consistent, Global Authorization System ↗

Ruoming Pang et al., GoogleUSENIX ATC 2019, GoogleJun 2026

Zanzibar is Google's authorization system serving billions of access-control checks per second for Drive, YouTube, Photos, and more. It introduced the relation-tuple model that spawned OpenFGA, SpiceDB, and Ory Keto. Foundational reading for anyone designing RBAC or ReBAC systems at scale.

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure ↗

Benjamin H. Sigelman et al., GoogleGoogle Technical Report 2010Jun 2026

Dapper defined distributed tracing as we know it. It inspired OpenTracing, Zipkin, Jaeger, and the entire observability ecosystem. For anyone building microservices, understanding how Dapper achieves low-overhead, always-on tracing across thousands of services is fundamental to production operations.

Time, Clocks, and the Ordering of Events in a Distributed System ↗

Leslie LamportCommunications of the ACM 1978Jun 2026

The paper that introduced Lamport clocks and the happens-before relation — the theoretical foundation of all distributed systems reasoning. It defines how we think about causality, event ordering, and consistency without a global clock. One of the most cited CS papers of all time.

Attention Is All You Need ↗

Ashish Vaswani, Noam Shazeer, Niki Parmar et al., Google BrainNeurIPS 2017, Google BrainJun 2026

The Transformer paper. Arguably the most influential ML paper of the decade — the architecture behind GPT, BERT, T5, and every modern LLM. The self-attention mechanism it introduced replaced recurrence and convolution, enabling parallelization and scaling that powered the generative AI revolution.

Paxos Made Simple ↗

Leslie LamportACM SIGACT News 2001, Microsoft ResearchJun 2026

Lamport's accessible rewrite of the Paxos consensus algorithm. Paxos underlies Google Chubby, Zookeeper, and virtually every distributed coordination service. Though simpler than the original, it remains a conceptual cornerstone for understanding fault-tolerant consensus in distributed systems.

In Search of an Understandable Consensus Algorithm (Raft) ↗

Diego Ongaro, John OusterhoutUSENIX ATC 2014, StanfordJun 2026

Raft was designed explicitly to be more understandable than Paxos while providing equivalent guarantees for consensus in distributed systems. It is now the consensus algorithm of choice in etcd, CockroachDB, TiKV, and many other production systems. Essential for understanding leader election and log replication.

Kafka: a Distributed Messaging System for Log Processing ↗

Jay Kreps, Neha Narkhede, Jun RaoNetDB 2011, LinkedInJun 2026

Kafka's log-centric, append-only design for distributed messaging is one of the most impactful ideas in modern data engineering. It powers real-time data pipelines at LinkedIn, Uber, Netflix, and thousands of companies. Critical reading for anyone building event-driven or streaming architectures.

Spanner: Google's Globally Distributed Database ↗

James C. Corbett et al., GoogleOSDI 2012, GoogleJun 2026

Spanner is the first system to provide globally-distributed ACID transactions with external consistency using TrueTime. It shattered the assumption that distributed systems cannot have strong consistency at global scale. The foundation for Cloud Spanner and modern globally-distributed NewSQL databases.

Bigtable: A Distributed Storage System for Structured Data ↗

Fay Chang, Jeffrey Dean, Sanjay Ghemawat et al., GoogleOSDI 2006, GoogleJun 2026

Bigtable introduced the sparse, distributed, persistent multi-dimensional sorted map model. It inspired HBase, Cassandra's data model, and wide-column NoSQL stores. Understanding Bigtable is key to grasping how Google Search, Maps, and Earth store and query petabytes of structured data efficiently.

Dynamo: Amazon's Highly Available Key-Value Store ↗

Giuseppe DeCandia et al., AmazonSOSP 2007, AmazonJun 2026

Dynamo defined the practical approach to eventual consistency in distributed systems using consistent hashing, vector clocks, and sloppy quorums. It directly inspired Cassandra, Riak, and DynamoDB. A seminal paper on trading consistency for availability — central to the CAP theorem discussion.

The Google File System ↗

Sanjay Ghemawat, Howard Gobioff, Shun-Tak LeungSOSP 2003, GoogleJun 2026

GFS introduced the design principles behind distributed file systems at massive scale: chunk-based storage, single master, fault tolerance via replication. It directly inspired HDFS and underpins Google's infrastructure. A must-read for anyone building distributed storage systems.

MapReduce: Simplified Data Processing on Large Clusters ↗

Jeffrey Dean, Sanjay GhemawatOSDI 2004, GoogleJun 2026

The paper that defined the MapReduce programming model — a foundational abstraction for distributed data processing. It spawned Hadoop, Spark, and the entire big data ecosystem. Essential reading for understanding how to process petabyte-scale data across commodity hardware clusters.