ch01
dz / designing_data_intensive_applications / ch01Summary
Chapter 1: reliable, scalable, and maintainable applications
Node Tree
- describing_performance
- legacy_system
- load_cope_approaches
- maintainability
- managing_complexity
- more_efficient_than_rolling_window
- multiple_calls_high_latency_probability
- reliability
- scalability
- small_request_holdup_rest
Nodes
reliability | |
content | Reliability |
children | continue_work_correctly (definition), faults |
continue_work_correctly | |
content | Continuing to work correclty, even when things go wrong |
parents | faults, reliability |
faults | |
content | faults |
children | anticipate_cope, continue_work_correctly, deviation_spec, hardware, human, not_equal_to_failure (Faults are not equivalent to failure), software, things_go_wrong (definition) |
parents | reliability |
things_go_wrong | |
content | things that go wrong |
parents | faults |
anticipate_cope | |
content | anticipate/cope |
children | fault_tolerant, resilient |
parents | faults |
fault_tolerant | |
content | Fault Tolerant |
parents | anticipate_cope |
resilient | |
content | Resilient |
parents | anticipate_cope |
not_equal_to_failure | |
content | Not equivalent to failure |
children | systems_stop_providing_service (this is what a failure is) |
parents | faults |
deviation_spec | |
content | Deviation from Spec |
parents | faults |
systems_stop_providing_service | |
content | Systems as a whole stop providing service to user |
parents | not_equal_to_failure |
hardware | |
content | Hardware |
children | MTTF, hardware_failure_examples (examples), mean_time_to_failure_disks, tolerate_loss_entire_machine |
parents | faults |
hardware_failure_examples | |
content | Examples of harware failure: disk crash, faulty RAM, power outage |
parents | hardware |
mean_time_to_failure_disks | |
content | Mean Time To Failure (MTTF) for disks: 10-60yrs |
parents | hardware, MTTF |
MTTF | |
content | Mean Time To Failure (MTTF) |
children | mean_time_to_failure_disks |
parents | hardware |
tolerate_loss_entire_machine | |
content | Tolerate loss of entire machine |
children | rolling_updates |
parents | hardware |
rolling_updates | |
content | Rolling updates: patch made one ndoe at a time |
parents | tolerate_loss_entire_machine |
software | |
content | Software |
children | bugs (Kind of software fault), corrupted_service (Kind of software fault), runaway_process (Kind of software fault) |
parents | faults |
bugs | |
content | Bugs |
parents | software |
runaway_process | |
content | Runaway Process |
parents | software |
corrupted_service | |
content | Slow/Unresponsive corrupted service |
parents | software |
human | |
content | Human |
children | decouple_mistakes, detailed_clear_monitoring, good_management, minimize_opportunities, quick_easy_recovery |
parents | faults |
decouple_mistakes | |
content | Decouple mistakes from failures |
children | sandbox |
parents | human |
minimize_opportunities | |
content | Minimize Opportunities |
parents | human |
quick_easy_recovery | |
content | Quick, Easy, Recovoery |
parents | human |
detailed_clear_monitoring | |
content | Detailed and Clear Monitoring |
children | telemetry |
parents | human |
telemetry | |
content | Telemetry |
parents | detailed_clear_monitoring |
sandbox | |
content | sandbox |
parents | decouple_mistakes |
good_management | |
content | Good Management Practices |
parents | human |
scalability | |
content | Scalability |
children | cope_increased_load, describing_load |
cope_increased_load | |
content | System's ability to cope with increased load |
parents | scalability |
describing_load | |
content | Describing Load |
children | ex_twitter, load_params |
parents | scalability |
ex_twitter | |
content | Example: Twitter |
children | post_tweet |
parents | describing_load |
load_params | |
content | Load Parameters |
children | follows_per_user, load_param_increased |
parents | describing_load |
post_tweet | |
content | Post a Tweet |
children | approach_SQL, approach_cache |
parents | ex_twitter |
approach_SQL | |
content | Approach A: SQL Join |
children | could_keep_up, post_tweet_more_work (Compared to) |
parents | post_tweet |
approach_cache | |
content | Approach B: cache each users home timeline |
children | could_keep_up (solution), fan_out, faster_reads, hybrid_approach, post_tweet_more_work |
parents | post_tweet |
could_keep_up | |
content | Initial approach, couldn't keep up with load of home timelines |
parents | approach_SQL, approach_cache |
faster_reads | |
content | Faster Reads |
parents | approach_cache |
post_tweet_more_work | |
content | Posting a tweet takes more work |
parents | approach_SQL, approach_cache |
follows_per_user | |
content | Follower Per User: key load parameter for scalability |
children | fan_out |
parents | load_params |
fan_out | |
content | Fan Out |
parents | follows_per_user, approach_cache |
hybrid_approach | |
content | Hybrid Approach: tweets from users with huge amount of followers (celebrites) handled separately |
parents | approach_cache |
describing_performance | |
content | Describing Performance |
children | load_param_increased, response_time, throughput |
load_param_increased | |
content | When you increase a load parameter |
children | keep_resources_unchanged, maintain_performance |
parents | load_params, describing_performance |
keep_resources_unchanged | |
content | ...and keep resources unchanged, how is system performance affected? |
parents | load_param_increased |
maintain_performance | |
content | how much increased in resources is needed to maintain current performance? |
parents | load_param_increased |
throughput | |
content | throughput |
children | batch_process_system, num_records_processed_per_second |
parents | describing_performance |
num_records_processed_per_second | |
content | Number of records processed per second |
parents | throughput |
batch_process_system | |
content | Batch process system |
parents | throughput |
response_time | |
content | Response Time |
children | distribution_of_values, latency (Latency and Response time are often used interchangeably,but they measure different things.), online_systems (response time is a metric used in the context of systems,and services that are online), time_btwn_request_response (definition) |
parents | describing_performance |
time_btwn_request_response | |
content | Time between request and response |
parents | response_time |
online_systems | |
content | Online Systems |
parents | response_time |
latency | |
content | Latency: duration that request awaits to be handled |
parents | response_time |
distribution_of_values | |
content | Distribution of Values |
children | avg_mean, median, outliers, percentiles |
parents | response_time |
avg_mean | |
content | Average/mean |
parents | distribution_of_values |
median | |
content | Median |
parents | percentiles, distribution_of_values |
outliers | |
content | Outliers |
children | how_bad (Quantifying how bad the outliers are) |
parents | distribution_of_values |
how_bad | |
content | How bad are the outliers? P95, p99, p999. |
children | 50th_percentile, tail_latencies |
parents | outliers, percentiles |
percentiles | |
content | Percentiles |
children | 50th_percentile, how_bad, median, percentiles_in_practice, service_level_agreements, service_level_objectives |
parents | distribution_of_values |
50th_percentile | |
content | 50th Perecentile: p50 |
parents | how_bad, percentiles |
tail_latencies | |
content | Tail Latencies |
children | head_of_line_blocking, tail_latency_amplification |
parents | how_bad |
service_level_objectives | |
content | Service Level Objectives (SLO) |
parents | percentiles |
service_level_agreements | |
content | Service Level Agreements |
parents | percentiles |
head_of_line_blocking | |
content | head of line blocking |
parents | tail_latencies |
small_request_holdup_rest | |
content | small number of requests holding up subsequent requests |
percentiles_in_practice | |
content | Percentiles in Practice |
children | rolling_window, tail_latency_amplification |
parents | percentiles |
tail_latency_amplification | |
content | Tail Latency Amplification |
parents | percentiles_in_practice, tail_latencies |
rolling_window | |
content | Rolling window of response times |
parents | percentiles_in_practice |
multiple_calls_high_latency_probability | |
content | Chance of high latency increases when end-user requires multiple backend calls |
more_efficient_than_rolling_window | |
content | More efficient alternatives to rolling window: forward decay, t-digest, HdrHistrogram |
load_cope_approaches | |
content | Approaches for coping with load |
children | scaling |
scaling_up | |
content | Scaling Up |
children | vertical |
parents | scaling |
scaling_out | |
content | Scaling Out |
children | elastic, horizontal, magic_scaling_sauce, shared_nothing_arch |
parents | scaling |
scaling | |
content | Scaling |
children | scaling_out, scaling_up |
parents | load_cope_approaches |
vertical | |
content | Vertical |
parents | scaling_up |
horizontal | |
content | Horizontal |
parents | scaling_out |
elastic | |
content | Elastic |
children | auto_add_resources, good_for_unprepared_load, manual_simpler |
parents | scaling_out |
shared_nothing_arch | |
content | Shared nothing architecture |
parents | scaling_out |
auto_add_resources | |
content | auto-add resources on load increase |
parents | elastic |
good_for_unprepared_load | |
content | Good for unprepared laod |
parents | elastic |
manual_simpler | |
content | Manual is simpler |
parents | elastic |
magic_scaling_sauce | |
content | "Magic Scaling Sauce" (informal) |
parents | scaling_out |
maintainability | |
content | Maintainability |
children | design_principles |
legacy_system | |
content | Legacy System |
design_principles | |
content | Design Principles |
children | evolvability, operability, simplicity |
parents | maintainability |
operability | |
content | Operability |
children | good_operability, operations_team |
parents | design_principles |
simplicity | |
content | Simplicity |
parents | design_principles |
evolvability | |
content | Evolvability |
children | agile, evolve_aka, making_change_easy |
parents | design_principles |
evolve_aka | |
content | AKA |
children | extensibility, modifiability, plasticicity |
parents | evolvability |
extensibility | |
content | Extensibility |
parents | evolve_aka |
modifiability | |
content | Modifiability |
parents | evolve_aka |
plasticicity | |
content | Plasticicity |
parents | evolve_aka |
operations_team | |
content | Operations Team |
parents | operability |
good_operability | |
content | Good operability |
children | routine_tasks_easy (characteristic of) |
parents | operability |
routine_tasks_easy | |
content | Making routine tasks easy |
parents | good_operability |
managing_complexity | |
content | Managing Complexity |
children | big_ball_of_mud, remove_accidental_complexity |
big_ball_of_mud | |
content | Big Ball Of Mud |
children | mired_in_complexity (definition) |
parents | managing_complexity |
mired_in_complexity | |
content | Software Mired in Complexity |
parents | big_ball_of_mud |
remove_accidental_complexity | |
content | Removing accidental complexity |
children | abstraction (tool for), not_inherent_to_problem (What defines "accidental complexity"?) |
parents | managing_complexity |
not_inherent_to_problem | |
content | Not inherent to problem it solves |
parents | remove_accidental_complexity |
making_change_easy | |
content | Making Change Easy |
parents | evolvability |
agile | |
content | Agile |
children | agility_data_system_level, framework_adapting_change, test_driven_development, working_pattern |
parents | evolvability |
abstraction | |
content | Abstraction |
parents | remove_accidental_complexity |
test_driven_development | |
content | Test-Driven Development (TDD) |
parents | agile |
agility_data_system_level | |
content | Agility on the data system levle |
parents | agile |
working_pattern | |
content | working pattern |
parents | agile |
framework_adapting_change | |
content | Framework for adapting to change |
parents | agile |