Barbara Liskov October 2007
Replication Goal: provide reliability and availability by storing information at several nodes
Replication Issues Semantics What is being replicated Failure assumptions
One-copy consistency One-copy consistency Or weaker
Only reads and writes Only reads and writes General operations acct.deposit($$); acct.withdraw($$$);
Replication protocols Data replication Operations
Issue 3: Failure Assumptions Network is asynchronous Network is malicious - Corruption
- Replay
- Spoofing
- Handled via cryptography
Nodes are failstop or Byzantine
Failstop Failures Nodes fail by crashing - A machine is either working correctly or it is doing nothing!
The assumption made in the 1980s
Failstop failures Requires 2f+1 replicas - Operations must intersect at at least one replica
- In general want availability for both reads and writes: f+1 nodes is sufficient
- Read and write quorums
Data Replication R.H. Thomas, A majority consensus approach to concurrency control for multiple copy databases, ACM TODS, 1979 D.K. Gifford, Weighted voting for replicated data, SOSP 1979 H. Attiya, A. Bar-Noy, and D. Dolev, Sharing memory robustly in message-passing systems, JACM , Jan. 1995
Quorum Consensus Each data item has a version number write(d, val, v#) read(d) returns (val, v#) - Waits for f+1 matching v#’s
- Else does a write-back
Replicas must execute operations in the same order Implies replicas will have the same state, assuming - replicas start in the same state
- operations are deterministic
Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988 Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988 Replication in the Harp file system, S. Ghemawat et. al, SOSP 1991 The part-time parliament, L. Lamport, TOCS 1998 Paxos made simple, L. Lamport, Nov. 2001
Use a primary - It orders the operations
- Other replicas obey this order
System moves through a sequence of views System moves through a sequence of views
Client sends request to primary
Client sends request to primary Client sends request to primary Primary sends prepare message Replicas receive prepare - Send prepare-ok message to the primary
Client sends request to primary Client sends request to primary Primary sends prepare message to all Replicas receive prepare - Send prepare-ok message to the primary
Primary waits for f prepare-oks
Normal Case A 2-phase protocol: Only 3 message delays
Byzantine Failures Causes - Malicious attacks
- Non-deterministic software errors
3f+1 replicas are needed to survive f failures 3f+1 replicas are needed to survive f failures 2f+1 replicas is a quorum The minimum in an asynchronous network
BFT M. Castro and B. Liskov, Practical Byzantine faulty tolerance and proactive recovery, ACM TOCS, 2002
Primary runs the protocol in the normal case Primary runs the protocol in the normal case Replicas watch the primary and do a view change if it fails Key difference: replicas might lie Solution: add a pre-prepare phase
Client sends request to primary Client sends request to primary
Client sends request to primary Client sends request to primary Primary sends pre-prepare message to all
Client sends request to primary Client sends request to primary Primary sends pre-prepare message to all - Why not a prepare message?
- Because primary might be malicious
Client sends request to primary Client sends request to primary Primary sends pre-prepare message to all Replicas check the pre-prepare and if it is ok: - Send prepare messages to all
Replicas wait for 2f+1 matching prepares Replicas wait for 2f+1 matching prepares - Send commit message to all
Replicas wait for 2f+1 matching prepares Replicas wait for 2f+1 matching prepares - Send commit message to all
Replicas wait for 2f+1 matching commits - Execute operation and send result to client
Follow-on Work - BASE: using abstraction to improve fault tolerance, R. Rodrigo et al, SOSP 2001
- R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004
- J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07
- Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05
- HQ replications: a hybrid quorum protocol for Byzantine Fault tolerance, OSDI 06
Papers in SOSP 07 Monday 1:30-3:30 - Zyzzyva: Speculative Byzantine fault tolerance
- Tolerating Byzantine faults in database systems using commit barrier scheduling
- Low-overhead Byzantine fault-tolerant storage
- Attested append-only memory: making adversaries stick to their word
Tuesday: 11:00-12:00 - PeerReview: practical accountability for distributed systems
Dostları ilə paylaş: |