distributed_systems_MIT/lec03

lec03

dz / distributed_systems_MIT / lec03

Node Tree

log_better
weak_consistency
- nature_of_gfs
- not_strongly_consistent
  - extra_bits
    - two_phase_commit
  - two_phase_commit
GFS

Nodes

GFS
content	GFS (aka Google File System)
children	files_autosplit, high_speeds_parallel, internal_use, record_append, single_data_center, single_master, why_hard, GFS_goals, auto_failure_recovery, big_sequence, big_storage

big_storage
content	Big Storage
parents	GFS

why_hard
content	Why is Big Storage hard?
children	faults, performance
parents	GFS

performance
content	Performance
children	sharding (I forget what sharding has to do with performance)
parents	why_hard

faults
content	Faults
children	tolerance
parents	why_hard

sharding
content	Sharding
children	shard
parents	performance

tolerance
content	Fault Tolerance
children	replication
parents	faults

replication
content	Replication as a means to add fault tolerance
children	almost_identical
parents	tolerance

almost_identical
content	"Almost Identical" inconsistency risk
children	consistency
parents	replication

consistency
content	Consistency in replications
children	strong_consistency, bad_replication
parents	almost_identical

strong_consistency
content	A strongly consistent system will be identical when duplicating data
children	low_performance_tradeoff, not_strongly_consistent (NOT strongly consistent), system_behaves
parents	consistency

low_performance_tradeoff
content	A strongly consistent system has a low performance cost as a tradeoff.
parents	strong_consistency

system_behaves
content	A strongly consistent system behaves just like it was one server
parents	strong_consistency

bad_replication
content	Bad Replication Design
children	events_order
parents	consistency

events_order
content	no way to ensure events (writes/reads) processed in correct order
parents	bad_replication

GFS_goals
content	GFS Goals: Big, Fast, Global
parents	GFS

high_speeds_parallel
content	High Speeds, Parallel Access
parents	GFS

files_autosplit
content	Files Automatically Split
children	shard (One of the splits of a file is called a "shard")
parents	GFS

shard
content	Shard
children	chunk_server (Shards and chunks may be analogous)
parents	files_autosplit, sharding

auto_failure_recovery
content	Automatic Failure Recovery
parents	GFS

single_data_center
content	Single Data Center
parents	GFS

big_sequence
content	Designed for big sequential reads/writes
parents	GFS
remarks	As opposed to random reads/reads

internal_use
content	Used internally by Google
parents	GFS

weak_consistency
content	Designed with weak consistency
children	nature_of_gfs (I think this is what is meant by weak consistency here?), not_strongly_consistent
remarks	Heretical to use weak consistency for academics

single_master
content	Single Master
children	chunk_server (Master knows which chunks are stored on which chunk,servers), master_data
parents	GFS

chunk_server
content	Chunk Server stores the actually chunks
parents	single_master, shard
remarks	Are "chunks" the same thing as shards?

master_data
content	Master Data
children	filename, handle, log_checkpoint
parents	single_master

filename
content	Filename
children	nv
parents	master_data

nv
content	Non-volatile storage
parents	filename

handle
content	Handle
parents	master_data

log_checkpoint
content	log, checkpoint
children	disk_storage
parents	master_data

disk_storage
content	Stored to Disk
parents	log_checkpoint

log_better
content	Log is better than something like database or b-tree because it is more efficient

record_append
content	How a record is appened in GFS
children	client_data_ps, last_chunk
parents	GFS

last_chunk
content	Where is the last chunk?
children	ask_master
parents	record_append

ask_master
content	Ask the Master server
children	no_primary, primary_dead
parents	last_chunk

no_primary
content	No Primary?
children	find_replicate
parents	ask_master

find_replicate
content	Find an up-to-date replicate
children	pick_primary
parents	no_primary

pick_primary
content	Picks Primary
children	version_bumped
parents	find_replicate

version_bumped
content	Version Bumped
children	tells_primary_secondary
parents	pick_primary

tells_primary_secondary
content	Tells Primary and Secondary Replicates to Master
children	lease
parents	version_bumped

lease
content	Leased on Primary: "you are primary for 60s"
children	primary_dead (this is what the lease helps with), split_brain_solution
parents	tells_primary_secondary

client_data_ps
content	Client sends copy of data to Primary and Secondary
children	primary_offset
parents	record_append

primary_offset
content	Primary Picks Offset
children	replicas_write_to_off
parents	client_data_ps

replicas_write_to_off
content	All replicas told to write the data to that offset
children	all_replicas_ok
parents	primary_offset

all_replicas_ok
content	If all replicas reply back "yes", all okay
children	what_if_some_append
parents	replicas_write_to_off

what_if_some_append
content	What if only some append?
children	nature_of_gfs, records_different_order
parents	all_replicas_ok

nature_of_gfs
content	things sometimes not appending is just the nature of GFS
parents	what_if_some_append, weak_consistency

records_different_order
content	Records in replicas can be in different orders
parents	what_if_some_append

primary_dead
content	What if Master server thinks the Primary is dead?
children	master_doesnt_pick, two_primaries
parents	lease, ask_master

two_primaries
content	Two primaries in a system is known as "split brain"
children	master_doesnt_pick (Otherwise, you end up causing "Split Brain"), network_partition, split_brain_solution
parents	primary_dead

network_partition
content	split brain can be caused by a network partition where parts of the network can transmit but maybe not receive
parents	two_primaries

split_brain_solution
content	The solution to Split Brain (two primaries) is to use a lease on a primary. After the lease is up, commands are no longer sent to that primary.
parents	lease, two_primaries

master_doesnt_pick
content	The Master should NOT designate a primary
parents	two_primaries, primary_dead

two_phase_commit
content	Two-Phase Commit. A mechanism for strong consistency
parents	not_strongly_consistent, extra_bits

not_strongly_consistent
content	GFS is not strongly consistent
children	extra_bits, two_phase_commit
parents	strong_consistency, weak_consistency

extra_bits
content	GFS would need "extra bits" for strong consistency
children	two_phase_commit (One of the things you'd add to GFS to make it,strongly consistent)
parents	not_strongly_consistent