ch08
dz / designing_data_intensive_applications / ch08Summary
Chapter 8: The Trouble With Distributed Systems
Node Tree
- knowledge_truth_and_lies
- large_scale_computing_philosophies
- rapid_feedback
- timing_assumptions
- NTP
- clocks
- faults_partial_failures
Nodes
faults_partial_failures | |
content | Faults and Partial Failures |
children | partial_failure, unreliable_networks, detecting_faults, deterministic |
deterministic | |
content | Deterministic: same opereration produces same result |
children | nondeterministic (vs) |
parents | faults_partial_failures |
partial_failure | |
content | Partial Failure: parts of distributed system broken in unpredictable way, other parts working fine |
children | makes_dist_systems_difficult, nondeterministic |
parents | faults_partial_failures |
nondeterministic | |
content | nondeterministic |
parents | deterministic, partial_failure |
makes_dist_systems_difficult | |
content | What makes distributed systems hard to work with |
parents | partial_failure |
large_scale_computing_philosophies | |
content | Large-scale computing philosophies |
children | HPC, cloud_and_super_computing |
cloud_and_super_computing | |
content | Cloud computing and supercomputing |
children | thousands_of_nodes_something_wrong, HPC (vs), cloud_computing |
parents | large_scale_computing_philosophies |
HPC | |
content | High Performance Computing (HPC) |
children | checkpoint_and_crash |
parents | cloud_and_super_computing, large_scale_computing_philosophies |
checkpoint_and_crash | |
content | Use checkpoints, let everything crash |
parents | HPC |
cloud_computing | |
content | Cloud Computing |
children | reliable_system_unreliable_components |
parents | cloud_and_super_computing |
thousands_of_nodes_something_wrong | |
content | When a system has thousands of nodes, it is safe to assume something is wrong. |
parents | cloud_and_super_computing |
reliable_system_unreliable_components | |
content | Reliable system form unreliable components |
parents | cloud_computing |
unreliable_networks | |
content | Unreliable Networks |
children | impossible_determine_why_request_not_received, network_faults_practice, APN |
parents | faults_partial_failures |
APN | |
content | Asynchronous Packet Networks (APN) |
children | impossible_determine_why_request_not_received, sync_async_networks, unbounded_delay |
parents | unreliable_networks |
impossible_determine_why_request_not_received | |
content | Impossible to determine why you don't receive request from APN |
children | timeout |
parents | unreliable_networks, APN |
timeout | |
content | Timeout: after some time, give up waiting and assume response won't arrive |
children | flow_control, how_long_to_declare_dead, ideal_timeout, jitter, network_congestion, no_correct_timeout, noisy_neighbor, unbounded_delay |
parents | impossible_determine_why_request_not_received |
network_faults_practice | |
content | Network Faults in Practice |
children | how_software_reacts_to_network_problems, network_fault, network_partition |
parents | unreliable_networks |
network_partition | |
content | Network Partition / Netsplit: One part of network cut off from the rest of network due to a network fault |
children | network_fault (AKA) |
parents | network_faults_practice |
network_fault | |
content | Network Fault |
parents | network_partition, network_faults_practice |
how_software_reacts_to_network_problems | |
content | Need to know how your software reacts to network problems and that the system can recover from them |
parents | network_faults_practice |
detecting_faults | |
content | Detecting Faults |
parents | faults_partial_failures |
rapid_feedback | |
content | Rapid Feedback about a node being down is useful, but can't count on it |
children | certain_response_success (the only way to know for sure) |
certain_response_success | |
content | Certainty that response that response was successful: positive response from application itself |
parents | rapid_feedback |
how_long_to_declare_dead | |
content | How long to wait before declaring a node is dead? |
parents | timeout |
unbounded_delay | |
content | Unbounded Delay: deliver packets as quickly as possible, no upper limit on packet arrival time. |
children | no_correct_timeout, variable_delays_dynamic_partitioning |
parents | APN, timeout |
ideal_timeout | |
content | Ideal Timeout: 2d + r, where d is time to deliver, and r is the time it takes to handle request on non-failed node. |
parents | timeout |
network_congestion | |
content | Network Congestion: packets have to wait in queue for a slot |
parents | timeout |
flow_control | |
content | flow control / congestion avboidance / backpressure: (TCP) node limits rate of sending in order to avoid overloading newtwork link or receiving node |
parents | timeout |
noisy_neighbor | |
content | Noisy Neighbor: delays caused by other customer using shared resources. |
parents | timeout |
jitter | |
content | Jitter: systems can continuously monitor response times and their variability |
parents | timeout |
sync_async_networks | |
content | Synchronous vs Asynchronous Networks |
children | synchronous, ISDN_network, bursty_traffic, circuit, emulating_circuit_switching |
parents | APN |
circuit | |
content | Circuit: (in telephone systems) fixed, guaranteed amount of bandwidth allocated for a call |
children | synchronous, ISDN_network |
parents | sync_async_networks |
synchronous | |
content | Synchronous: no queuing, 16 bits of space are already reserved. |
parents | timing_assumptions, sync_async_networks, circuit |
ISDN_network | |
content | ISDN Network |
parents | sync_async_networks, circuit |
bursty_traffic | |
content | Internet and datacenter networks optimized for bursty traffic instead of constant bitrate |
parents | sync_async_networks |
quality_of_service | |
content | Quality of Service: priority scheduling of packets |
parents | emulating_circuit_switching |
emulating_circuit_switching | |
content | Emulating circuit switching on asynchronous network or statistically bounded delay |
children | quality_of_service, admission_control |
parents | sync_async_networks |
admission_control | |
content | Admission Control: rate limit senders |
parents | emulating_circuit_switching |
variable_delays_dynamic_partitioning | |
content | More generally, you can think of variable delays as a consequence of dynamic partitioning |
children | static_partitioning (dynamic vs static partitioning), dynamic_partitioning |
parents | unbounded_delay |
static_partitioning | |
content | Static Partitioning: latency guarantees, reducied utilization (less expensive) |
children | dynamic_partitioning (vs) |
parents | variable_delays_dynamic_partitioning |
no_correct_timeout | |
content | No correct timeout, determined experimentally |
parents | unbounded_delay, timeout |
dynamic_partitioning | |
content | Dynamic Partitioning: multi-tenancy, better utilization (cheaper), but variable delays |
parents | static_partitioning, variable_delays_dynamic_partitioning |
clocks | |
content | clocks |
children | measuring_duration, monotonic_clock, node_clock_not_accurate, smearing, time_of_day_clock, clock_issues, clock_reading_confidence_interval, describing_points_in_time, drift |
measuring_duration | |
content | Measuring Duration |
children | describing_points_in_time (related) |
parents | clocks |
describing_points_in_time | |
content | Describing points in time |
parents | measuring_duration, clocks |
node_clock_not_accurate | |
content | Each node has its own hardware clock (quartz, usually), not perfectly accurate |
parents | clocks |
time_of_day_clock | |
content | Time-of-Day Clock: returns current calendar date |
children | monotonic_clock (vs), unsuitable_for_measuing_duration, clock_sync_global_snapshots |
parents | NTP, clocks |
monotonic_clock | |
content | Monotonic Clock: Clock that is guaranteed to move forward in time. Suitable for measuring durations. |
children | slew, timediff_yields_duration_absolute_meaningless, unsuitable_for_measuing_duration (used for measuring elapsed time), different_timers_per_CPU, drift |
parents | time_of_day_clock, clocks |
unsuitable_for_measuing_duration | |
content | Time jumps (backwards) and (often) ignoring leap seconds make time-of-day clock unsuitable for measuring elapsed time |
parents | time_of_day_clock, monotonic_clock |
NTP | |
content | NTP: network time protocol, syncs clocks according to time reported by group of servers |
children | slew (NTP uses slew), smearing (how NTP handles leap seconds), time_of_day_clock (synced with), NTP_accuracy_limited_by_network_roundtrip, NTP_slew_amount |
timediff_yields_duration_absolute_meaningless | |
content | Difference between times gives duration elapsed, absolute value is meaningless |
parents | monotonic_clock |
different_timers_per_CPU | |
content | Different timers per CPU: unsynchronized. OS compensates, but take monoticity guarantee with grain of salt |
parents | clock_issues, monotonic_clock |
slew | |
content | Slew: to adjust the rate a monotonic clock moves forward. |
children | NTP_slew_amount |
parents | NTP, monotonic_clock |
NTP_slew_amount | |
content | NTP slew rate up to +/- 0.05%, but no forward/backward jumps |
parents | NTP, slew |
drift | |
content | Drift: clock runs slower/faster than it should |
children | smearing, timestamps_for_ordering_can_fail, 200ppm_drift |
parents | clock_issues, clocks, monotonic_clock |
200ppm_drift | |
content | Google assumes 200ppm drift for their servers. 6ms drift for a node resynchronized every 30s, 17s for a node resynchronized once a day |
parents | drift |
smearing | |
content | Smearing: performing leap second gradually over course of day. |
parents | drift, NTP, clocks |
timestamps_for_ordering_can_fail | |
content | Timestamps for ordering events across multiple nodes can fail due to skew. |
parents | drift |
NTP_accuracy_limited_by_network_roundtrip | |
content | NTP accuracy limited by network roundtrip time |
parents | NTP |
clock_reading_confidence_interval | |
content | Clock Reading: less a point in time, more of a range with confidence interval |
children | truetime_google, uncertainy_bound |
parents | clocks |
uncertainy_bound | |
content | Uncertainty bound computed based on time source |
parents | clock_reading_confidence_interval |
truetime_google | |
content | TrueTime (Google Spanner): explicitely reports confidence interval on local clock |
children | clock_sync_global_snapshots |
parents | clock_reading_confidence_interval |
clock_sync_global_snapshots | |
content | Clock sync for global snapshots (synched time-of-day-clocks used as transaction IDs) |
parents | truetime_google, time_of_day_clock |
clock_issues | |
content | Clock Issues |
children | process_pauses, VM_suspended, different_timers_per_CPU, drift |
parents | clocks |
process_pauses | |
content | process pauses |
children | hard_realtime_systems, preempt_thread, steal_time, thrashing, GC_brief_planned_outages |
parents | clock_issues |
VM_suspended | |
content | VM Suspended |
parents | clock_issues |
preempt_thread | |
content | Preempt running thread at any moment and resume it without any noticing |
parents | process_pauses |
thrashing | |
content | thrashing: OS spends most time swapping pages in/out of memory |
parents | process_pauses |
steal_time | |
content | steal_time: CPU time spent in another thread |
parents | process_pauses |
hard_realtime_systems | |
content | Hard Realtime Systems: specified deadline by which software must respond |
children | RTOS |
parents | process_pauses |
RTOS | |
content | Real-Time Operating System (RTOS): allows processes to be scheduled with a guaranteed allocation of CPU time in specified intervals |
children | GC_brief_planned_outages (GC) |
parents | hard_realtime_systems |
GC_brief_planned_outages | |
content | Garbage collection: emerging idewa to treat GC pauses like brief planned outages, let other nodes handle requests |
parents | process_pauses, RTOS |
knowledge_truth_and_lies | |
content | Knowledge, Truth, and Lies |
children | fencing, system_model, truth_defined_by_majority, byzantine_fault |
system_model | |
content | System Model: assumptions about behavior in distrubuted systems. |
children | node_failures, correctness |
parents | knowledge_truth_and_lies |
truth_defined_by_majority | |
content | Truth defined by majority |
children | node_cant_trust_judgement_of_situation, quorum, consensus_algos |
parents | knowledge_truth_and_lies |
node_cant_trust_judgement_of_situation | |
content | A node can't trust its own judgement of a situation |
parents | truth_defined_by_majority |
quorum | |
content | Quorum: voting amongst the nodes |
parents | truth_defined_by_majority |
consensus_algos | |
content | Consensus Algorithms |
parents | truth_defined_by_majority |
fencing | |
content | Fencing: in locking/leasing to protect access to a resource, a way to ensure nodes under false belief of ownership can't disrupt rest of system. |
children | fencing_token |
parents | knowledge_truth_and_lies |
fencing_token | |
content | Fencing Token: returned every time lock or lease granted, It is a number that increses |
children | fencing_tokens_lock |
parents | fencing |
byzantine_fault | |
content | Byzantine Fault: node claims to have received a particular message when in fact it didn't |
children | relevant_p2p, byz_fault_tolerant, byz_ft_protocols_complicated, consensus_in_untrusted_environment |
parents | knowledge_truth_and_lies |
consensus_in_untrusted_environment | |
content | Reaching consensus in untrusted environment |
children | byzantine_generals_problem (terminology) |
parents | byzantine_fault |
byzantine_generals_problem | |
content | Byzantine Generals Problem |
parents | consensus_in_untrusted_environment |
byz_fault_tolerant | |
content | Byzantine Fault Tolerant: oeprates correctly even if some of the nodes are malfunctioning or if there are malicious attackers |
children | relevant_p2p, byz_ft_protocols_complicated |
parents | byzantine_fault |
byz_ft_protocols_complicated | |
content | Protocoals for Byzantine Fault Tolerant systems are usually complicated and impractical in most use server-side data systems. |
parents | byz_fault_tolerant, byzantine_fault |
relevant_p2p | |
content | Relevant in P2P systems |
parents | byz_fault_tolerant, byzantine_fault |
timing_assumptions | |
content | Timing Assumptions |
children | partially_synchronous, synchronous, asynchronous |
asynchronous | |
content | Asynchronous |
parents | timing_assumptions |
partially_synchronous | |
content | Partially Synchronous |
parents | timing_assumptions |
node_failures | |
content | Node Failures |
children | byzantine_arbitrary_faults, crash_recoevery_faults, crash_stop_faults |
parents | system_model |
crash_stop_faults | |
content | Crash-stop faults |
parents | node_failures |
crash_recoevery_faults | |
content | Crash Recovery Faults |
parents | node_failures |
byzantine_arbitrary_faults | |
content | Byzantine (arbitrary) faults |
parents | node_failures |
correctness | |
content | Correctness |
children | proving_correct_doesnt_mean_implementation_correct, correctness_properties |
parents | system_model |
correctness_properties | |
content | Corretness Properties |
children | fencing_tokens_lock, liveness, safety |
parents | correctness |
safety | |
content | Safety |
children | monotonic_sequence, nothing_bad_happens (informal description), on_violation_point_to_time, uniqueness |
parents | correctness_properties |
liveness | |
content | liveness |
children | satisfied_in_future, something_good_eventually_happens (informal description), availability |
parents | correctness_properties |
fencing_tokens_lock | |
content | Fencing tokens for lock |
children | uniqueness |
parents | correctness_properties, fencing_token |
uniqueness | |
content | Uniqueness |
parents | fencing_tokens_lock, safety |
monotonic_sequence | |
content | Monotonic Sequence |
parents | safety |
availability | |
content | Availability |
parents | liveness |
nothing_bad_happens | |
content | Nothing bad happens |
parents | safety |
something_good_eventually_happens | |
content | something good eventually happens |
parents | liveness |
on_violation_point_to_time | |
content | On violation, able to point to specific point in time. Violation can't be undone. |
parents | safety |
satisfied_in_future | |
content | May not hold at some point in time, but may be satisified in the future |
parents | liveness |
proving_correct_doesnt_mean_implementation_correct | |
content | Proving correct doesn't mean implementation will be, but it's a good first step |
parents | correctness |