distributed_systems_MIT/lec03

lec03

dz / distributed_systems_MIT / lec03

Node Tree

Nodes

GFS
content GFS (aka Google File System)
children files_autosplit, high_speeds_parallel, internal_use, record_append, single_data_center, single_master, why_hard, GFS_goals, auto_failure_recovery, big_sequence, big_storage

big_storage
content Big Storage
parents GFS

why_hard
content Why is Big Storage hard?
children faults, performance
parents GFS

performance
content Performance
children sharding (I forget what sharding has to do with performance)
parents why_hard

faults
content Faults
children tolerance
parents why_hard

sharding
content Sharding
children shard
parents performance

tolerance
content Fault Tolerance
children replication
parents faults

replication
content Replication as a means to add fault tolerance
children almost_identical
parents tolerance

almost_identical
content "Almost Identical" inconsistency risk
children consistency
parents replication

consistency
content Consistency in replications
children strong_consistency, bad_replication
parents almost_identical

strong_consistency
content A strongly consistent system will be identical when duplicating data
children low_performance_tradeoff, not_strongly_consistent (NOT strongly consistent), system_behaves
parents consistency

low_performance_tradeoff
content A strongly consistent system has a low performance cost as a tradeoff.
parents strong_consistency

system_behaves
content A strongly consistent system behaves just like it was one server
parents strong_consistency

bad_replication
content Bad Replication Design
children events_order
parents consistency

events_order
content no way to ensure events (writes/reads) processed in correct order
parents bad_replication

GFS_goals
content GFS Goals: Big, Fast, Global
parents GFS

high_speeds_parallel
content High Speeds, Parallel Access
parents GFS

files_autosplit
content Files Automatically Split
children shard (One of the splits of a file is called a "shard")
parents GFS

shard
content Shard
children chunk_server (Shards and chunks may be analogous)
parents files_autosplit, sharding

auto_failure_recovery
content Automatic Failure Recovery
parents GFS

single_data_center
content Single Data Center
parents GFS

big_sequence
content Designed for big sequential reads/writes
parents GFS
remarks As opposed to random reads/reads

internal_use
content Used internally by Google
parents GFS

weak_consistency
content Designed with weak consistency
children nature_of_gfs (I think this is what is meant by weak consistency here?), not_strongly_consistent
remarks Heretical to use weak consistency for academics

single_master
content Single Master
children chunk_server (Master knows which chunks are stored on which chunk,servers), master_data
parents GFS

chunk_server
content Chunk Server stores the actually chunks
parents single_master, shard
remarks Are "chunks" the same thing as shards?

master_data
content Master Data
children filename, handle, log_checkpoint
parents single_master

filename
content Filename
children nv
parents master_data

nv
content Non-volatile storage
parents filename

handle
content Handle
parents master_data

log_checkpoint
content log, checkpoint
children disk_storage
parents master_data

disk_storage
content Stored to Disk
parents log_checkpoint

log_better
content Log is better than something like database or b-tree because it is more efficient

record_append
content How a record is appened in GFS
children client_data_ps, last_chunk
parents GFS

last_chunk
content Where is the last chunk?
children ask_master
parents record_append

ask_master
content Ask the Master server
children no_primary, primary_dead
parents last_chunk

no_primary
content No Primary?
children find_replicate
parents ask_master

find_replicate
content Find an up-to-date replicate
children pick_primary
parents no_primary

pick_primary
content Picks Primary
children version_bumped
parents find_replicate

version_bumped
content Version Bumped
children tells_primary_secondary
parents pick_primary

tells_primary_secondary
content Tells Primary and Secondary Replicates to Master
children lease
parents version_bumped

lease
content Leased on Primary: "you are primary for 60s"
children primary_dead (this is what the lease helps with), split_brain_solution
parents tells_primary_secondary

client_data_ps
content Client sends copy of data to Primary and Secondary
children primary_offset
parents record_append

primary_offset
content Primary Picks Offset
children replicas_write_to_off
parents client_data_ps

replicas_write_to_off
content All replicas told to write the data to that offset
children all_replicas_ok
parents primary_offset

all_replicas_ok
content If all replicas reply back "yes", all okay
children what_if_some_append
parents replicas_write_to_off

what_if_some_append
content What if only some append?
children nature_of_gfs, records_different_order
parents all_replicas_ok

nature_of_gfs
content things sometimes not appending is just the nature of GFS
parents what_if_some_append, weak_consistency

records_different_order
content Records in replicas can be in different orders
parents what_if_some_append

primary_dead
content What if Master server thinks the Primary is dead?
children master_doesnt_pick, two_primaries
parents lease, ask_master

two_primaries
content Two primaries in a system is known as "split brain"
children master_doesnt_pick (Otherwise, you end up causing "Split Brain"), network_partition, split_brain_solution
parents primary_dead

network_partition
content split brain can be caused by a network partition where parts of the network can transmit but maybe not receive
parents two_primaries

split_brain_solution
content The solution to Split Brain (two primaries) is to use a lease on a primary. After the lease is up, commands are no longer sent to that primary.
parents lease, two_primaries

master_doesnt_pick
content The Master should NOT designate a primary
parents two_primaries, primary_dead

two_phase_commit
content Two-Phase Commit. A mechanism for strong consistency
parents not_strongly_consistent, extra_bits

not_strongly_consistent
content GFS is not strongly consistent
children extra_bits, two_phase_commit
parents strong_consistency, weak_consistency

extra_bits
content GFS would need "extra bits" for strong consistency
children two_phase_commit (One of the things you'd add to GFS to make it,strongly consistent)
parents not_strongly_consistent