Systems Research Barbara Liskov

Yüklə 402 Kb.

Systems Research

Replication

Replication Issues

One-copy consistency

Only reads and writes

Replication protocols

Issue 3: Failure Assumptions

Failstop Failures

Failstop failures

Data Replication

Quorum Consensus

Replicas must execute operations in the same order

Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988

Use a primary

System moves through a sequence of views

Client sends request to primary

Client sends request to primary

Client sends request to primary

Normal Case

Byzantine Failures

3f+1 replicas are needed to survive f failures

BFT

Primary runs the protocol in the normal case

Client sends request to primary

Client sends request to primary

Client sends request to primary

Client sends request to primary

Replicas wait for 2f+1 matching prepares

Replicas wait for 2f+1 matching prepares

Follow-on Work

Papers in SOSP 07

Yüklə 402 Kb.

Dostları ilə paylaş:

Systems Research Barbara Liskov

Systems Research

Barbara Liskov

October 2007

Replication

Goal: provide reliability and availability by storing information at several nodes

Replication Issues

Semantics

What is being replicated

Failure assumptions

One-copy consistency

One-copy consistency

Or weaker

Only reads and writes

Only reads and writes

General operations

acct.deposit($$);

acct.withdraw($$$);

Replication protocols

Data replication

Operations

Issue 3: Failure Assumptions

Network is asynchronous

Network is malicious

Nodes are failstop or Byzantine

Failstop Failures

Nodes fail by crashing

The assumption made in the 1980s

Failstop failures

Requires 2f+1 replicas

Data Replication

R.H. Thomas, A majority consensus approach to concurrency control for multiple copy databases, ACM TODS, 1979

D.K. Gifford, Weighted voting for replicated data, SOSP 1979

H. Attiya, A. Bar-Noy, and D. Dolev, Sharing memory robustly in message-passing systems, JACM , Jan. 1995

Quorum Consensus

Each data item has a version number

write(d, val, v#)

read(d) returns (val, v#)

Replicas must execute operations in the same order

Replicas must execute operations in the same order

Implies replicas will have the same state, assuming

Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988

Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988

Replication in the Harp file system, S. Ghemawat et. al, SOSP 1991

The part-time parliament, L. Lamport, TOCS 1998

Paxos made simple, L. Lamport, Nov. 2001

Use a primary

Use a primary

System moves through a sequence of views

System moves through a sequence of views

Client sends request to primary

Client sends request to primary

Primary sends prepare message

Client sends request to primary

Client sends request to primary

Primary sends prepare message

Replicas receive prepare

Client sends request to primary

Client sends request to primary

Primary sends prepare message to all

Replicas receive prepare

Primary waits for f prepare-oks

Normal Case

A 2-phase protocol:

Only 3 message delays

Byzantine Failures

Nodes fail arbitrarily

Causes

3f+1 replicas are needed to survive f failures

3f+1 replicas are needed to survive f failures

2f+1 replicas is a quorum

The minimum in an asynchronous network

BFT

M. Castro and B. Liskov, Practical Byzantine faulty tolerance and proactive recovery, ACM TOCS, 2002

Primary runs the protocol in the normal case

Primary runs the protocol in the normal case

Replicas watch the primary and do a view change if it fails

Key difference: replicas might lie

Solution: add a pre-prepare phase

Client sends request to primary