Apparently this is getting a second edition soon(?) - so will wait off on doing
notes for the first edition.
Designing Data-Intensive Applications
Part I. Foundations of Data Systems
Chapter 1. Reliable, Scalable, and Maintainable Applications
Thinking About Data Systems
Reliability
How Important Is Reliability?
Scalability
Approaches for Coping with Load
Maintainability
Operability: Making Life Easy for Operations
Simplicity: Managing Complexity
Evolvability: Making Change Easy
Chapter 2. Data Models and Query Languages
Relational Model Versus Document Model
The Object-Relational Mismatch
Many-to-One and Many-to-Many Relationships
Are Document Databases Repeating History?
Relational Versus Document Databases Today
Query Languages for Data
Declarative Queries on the Web
Graph-Like Data Models
The Cypher Query Language
Chapter 3. Storage and Retrieval
Data Structures That Power Your Database
Comparing B-Trees and LSM-Trees
Other Indexing Structures
Transaction Processing or Analytics?
Stars and Snowflakes: Schemas for Analytics
Column-Oriented Storage
Sort Order in Column Storage
Writing to Column-Oriented Storage
Aggregation: Data Cubes and Materialized Views
Chapter 4. Encoding and Evolution
JSON, XML, and Binary Variants
Thrift and Protocol Buffers
Modes of Dataflow
Dataflow Through Databases
Dataflow Through Services: REST and RPC
Part II. Distributed Data
Chapter 5. Replication
Leaders and Followers
Synchronous Versus Asynchronous Replication
Implementation of Replication Logs
Problems with Replication Lag
Solutions for Replication Lag
Multi-Leader Replication
Use Cases for Multi-Leader Replication
Multi-Leader Replication Topologies
Leaderless Replication
Writing to the Database When a Node Is Down
Limitations of Quorum Consistency
Sloppy Quorums and Hinted Handoff
Detecting Concurrent Writes
Chapter 6. Partitioning
Partitioning and Replication
Partitioning of Key-Value Data
Partitioning by Key Range
Partitioning by Hash of Key
Skewed Workloads and Relieving Hot Spots
Partitioning and Secondary Indexes
Partitioning Secondary Indexes by Document
Partitioning Secondary Indexes by Term
Rebalancing Partitions
Strategies for Rebalancing
Operations: Automatic or Manual Rebalancing
Chapter 7. Transactions
The Slippery Concept of a Transaction
Single-Object and Multi-Object Operations
Weak Isolation Levels
Snapshot Isolation and Repeatable Read
Serializability
Serializable Snapshot Isolation (SSI)
Chapter 8. The Trouble with Distributed Systems
Faults and Partial Failures
Cloud Computing and Supercomputing
Unreliable Networks
Network Faults in Practice
Timeouts and Unbounded Delays
Synchronous Versus Asynchronous Networks
Unreliable Clocks
Monotonic Versus Time-of-Day Clocks
Clock Synchronization and Accuracy
Relying on Synchronized Clocks
Knowledge, Truth, and Lies
The Truth Is Defined by the Majority
Chapter 9. Consistency and Consensus
Linearizability
What Makes a System Linearizable?
Relying on Linearizability
Implementing Linearizable Systems
The Cost of Linearizability
Distributed Transactions and Consensus
Atomic Commit and Two-Phase Commit (2PC)
Distributed Transactions in Practice
Membership and Coordination Services
Part III. Derived Data
Chapter 10. Batch Processing
MapReduce and Distributed Filesystems
Reduce-Side Joins and Grouping
The Output of Batch Workflows
Comparing Hadoop to Distributed Databases
Beyond MapReduce
Graphs and Iterative Processing
High-Level APIs and Languages
Chapter 11. Stream Processing
Transmitting Event Streams
Databases and Streams
State, Streams, and Immutability
Processing Streams
Uses of Stream Processing
Chapter 12. The Future of Data Systems
Data Integration
Batch and Stream Processing
Unbundling Databases
Composing Data Storage Technologies
Designing Applications Around Dataflow
Aiming for Correctness
The End-to-End Argument for Databases