The list is highly subjective and by no means complete. If you need more comprehensive list of papers, then probably Papers We Love is a much better resource.
- A simple totally ordered broadcast protocol
- Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes
- In Search of an Understandable Consensus Algorithm
- Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases
- Paxos Made Live - An Engineering Perspective
- Paxos Made Simple
- Time, Clocks, and the Ordering of Events in a Distributed System
- Unreliable Failure Detectors for Reliable Distributed Systems
- Viewstamped Replication Revisited
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
- Cassandra - A Decentralized Structured Storage System
- Dynamo: Amazon’s Highly Available Key-value Store
- F1: A Distributed SQL Database That Scales
- Large-scale Incremental Processing Using Distributed Transactions and Notifications
- Procella: Unifying serving and analytical data at YouTube
- Spanner: Becoming a SQL System
- Spanner: Google’s Globally-Distributed Database
- Bigtable: A Distributed Storage System for Structured Data
- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data
- Ceph: A Scalable, High-Performance Distributed File System
- Dynamic Metadata Management for Petabyte-scale File Systems
- Facebook’s Tectonic Filesystem: Efficiency from Exascale
- Finding a needle in Haystack: Facebook’s photo storage
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services
- RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters
- Replex: A Scalable, Highly Available Multi-Index Data Store
- SLM-DB: Single-Level Key-Value Store with Persistent Memory
- The Google File System
- WiscKey: Separating Keys from Values in SSD-conscious Storage
- f4: Facebook’s Warm BLOB Storage System
- Borg, Omega, and Kubernetes
- Large-scale cluster management at Google with Borg
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- Omega: flexible, scalable schedulers for large compute clusters
This work is licensed under a Creative Commons Attribution 4.0 International License.