designing_data_intensive_applications/ch01

ch01

dz / designing_data_intensive_applications / ch01

Summary

Chapter 1: reliable, scalable, and maintainable applications

Node Tree

describing_performance
legacy_system
load_cope_approaches
- scaling
  - scaling_out
  - scaling_up
    - vertical
maintainability
- design_principles
managing_complexity
- remove_accidental_complexity
  - not_inherent_to_problem
  - abstraction
- big_ball_of_mud
  - mired_in_complexity
more_efficient_than_rolling_window
multiple_calls_high_latency_probability
reliability
- faults
- continue_work_correctly
scalability
- describing_load
  - ex_twitter
    - post_tweet
      - approach_SQL
        
        could_keep_up
        
        post_tweet_more_work
      - approach_cache
        
        could_keep_up
        
        fan_out
        
        faster_reads
        
        hybrid_approach
        
        post_tweet_more_work
  - load_params
    - follows_per_user
      - fan_out
    - load_param_increased
      - keep_resources_unchanged
      - maintain_performance
- cope_increased_load
small_request_holdup_rest

Nodes

reliability
content	Reliability
children	faults, continue_work_correctly (definition)

continue_work_correctly
content	Continuing to work correclty, even when things go wrong
parents	faults, reliability

faults
content	faults
children	deviation_spec, hardware, human, not_equal_to_failure (Faults are not equivalent to failure), software, things_go_wrong (definition), anticipate_cope, continue_work_correctly
parents	reliability

things_go_wrong
content	things that go wrong
parents	faults

anticipate_cope
content	anticipate/cope
children	fault_tolerant, resilient
parents	faults

fault_tolerant
content	Fault Tolerant
parents	anticipate_cope

resilient
content	Resilient
parents	anticipate_cope

not_equal_to_failure
content	Not equivalent to failure
children	systems_stop_providing_service (this is what a failure is)
parents	faults

deviation_spec
content	Deviation from Spec
parents	faults

systems_stop_providing_service
content	Systems as a whole stop providing service to user
parents	not_equal_to_failure

hardware
content	Hardware
children	hardware_failure_examples (examples), mean_time_to_failure_disks, tolerate_loss_entire_machine, MTTF
parents	faults

hardware_failure_examples
content	Examples of harware failure: disk crash, faulty RAM, power outage
parents	hardware

mean_time_to_failure_disks
content	Mean Time To Failure (MTTF) for disks: 10-60yrs
parents	hardware, MTTF

MTTF
content	Mean Time To Failure (MTTF)
children	mean_time_to_failure_disks
parents	hardware

tolerate_loss_entire_machine
content	Tolerate loss of entire machine
children	rolling_updates
parents	hardware

rolling_updates
content	Rolling updates: patch made one ndoe at a time
parents	tolerate_loss_entire_machine

software
content	Software
children	runaway_process (Kind of software fault), bugs (Kind of software fault), corrupted_service (Kind of software fault)
parents	faults

bugs
content	Bugs
parents	software

runaway_process
content	Runaway Process
parents	software

corrupted_service
content	Slow/Unresponsive corrupted service
parents	software

human
content	Human
children	decouple_mistakes, detailed_clear_monitoring, good_management, minimize_opportunities, quick_easy_recovery
parents	faults

decouple_mistakes
content	Decouple mistakes from failures
children	sandbox
parents	human

minimize_opportunities
content	Minimize Opportunities
parents	human

quick_easy_recovery
content	Quick, Easy, Recovoery
parents	human

detailed_clear_monitoring
content	Detailed and Clear Monitoring
children	telemetry
parents	human

telemetry
content	Telemetry
parents	detailed_clear_monitoring

sandbox
content	sandbox
parents	decouple_mistakes

good_management
content	Good Management Practices
parents	human

scalability
content	Scalability
children	describing_load, cope_increased_load

cope_increased_load
content	System's ability to cope with increased load
parents	scalability

describing_load
content	Describing Load
children	ex_twitter, load_params
parents	scalability

ex_twitter
content	Example: Twitter
children	post_tweet
parents	describing_load

load_params
content	Load Parameters
children	follows_per_user, load_param_increased
parents	describing_load

post_tweet
content	Post a Tweet
children	approach_SQL, approach_cache
parents	ex_twitter

approach_SQL
content	Approach A: SQL Join
children	could_keep_up, post_tweet_more_work (Compared to)
parents	post_tweet

approach_cache
content	Approach B: cache each users home timeline
children	could_keep_up (solution), fan_out, faster_reads, hybrid_approach, post_tweet_more_work
parents	post_tweet

could_keep_up
content	Initial approach, couldn't keep up with load of home timelines
parents	approach_SQL, approach_cache

faster_reads
content	Faster Reads
parents	approach_cache

post_tweet_more_work
content	Posting a tweet takes more work
parents	approach_SQL, approach_cache

follows_per_user
content	Follower Per User: key load parameter for scalability
children	fan_out
parents	load_params

fan_out
content	Fan Out
parents	approach_cache, follows_per_user

hybrid_approach
content	Hybrid Approach: tweets from users with huge amount of followers (celebrites) handled separately
parents	approach_cache

describing_performance
content	Describing Performance
children	load_param_increased, response_time, throughput

load_param_increased
content	When you increase a load parameter
children	keep_resources_unchanged, maintain_performance
parents	load_params, describing_performance

keep_resources_unchanged
content	...and keep resources unchanged, how is system performance affected?
parents	load_param_increased

maintain_performance
content	how much increased in resources is needed to maintain current performance?
parents	load_param_increased

throughput
content	throughput
children	num_records_processed_per_second, batch_process_system
parents	describing_performance

num_records_processed_per_second
content	Number of records processed per second
parents	throughput

batch_process_system
content	Batch process system
parents	throughput

response_time
content	Response Time
children	distribution_of_values, latency (Latency and Response time are often used interchangeably,but they measure different things.), online_systems (response time is a metric used in the context of systems,and services that are online), time_btwn_request_response (definition)
parents	describing_performance

time_btwn_request_response
content	Time between request and response
parents	response_time

online_systems
content	Online Systems
parents	response_time

latency
content	Latency: duration that request awaits to be handled
parents	response_time

distribution_of_values
content	Distribution of Values
children	median, outliers, percentiles, avg_mean
parents	response_time

avg_mean
content	Average/mean
parents	distribution_of_values

median
content	Median
parents	percentiles, distribution_of_values

outliers
content	Outliers
children	how_bad (Quantifying how bad the outliers are)
parents	distribution_of_values

how_bad
content	How bad are the outliers? P95, p99, p999.
children	tail_latencies, 50th_percentile
parents	outliers, percentiles

percentiles
content	Percentiles
children	how_bad, median, percentiles_in_practice, service_level_agreements, service_level_objectives, 50th_percentile
parents	distribution_of_values

50th_percentile
content	50th Perecentile: p50
parents	how_bad, percentiles

tail_latencies
content	Tail Latencies
children	head_of_line_blocking, tail_latency_amplification
parents	how_bad

service_level_objectives
content	Service Level Objectives (SLO)
parents	percentiles

service_level_agreements
content	Service Level Agreements
parents	percentiles

head_of_line_blocking
content	head of line blocking
parents	tail_latencies

small_request_holdup_rest
content	small number of requests holding up subsequent requests

percentiles_in_practice
content	Percentiles in Practice
children	rolling_window, tail_latency_amplification
parents	percentiles

tail_latency_amplification
content	Tail Latency Amplification
parents	percentiles_in_practice, tail_latencies

rolling_window
content	Rolling window of response times
parents	percentiles_in_practice

multiple_calls_high_latency_probability
content	Chance of high latency increases when end-user requires multiple backend calls

more_efficient_than_rolling_window
content	More efficient alternatives to rolling window: forward decay, t-digest, HdrHistrogram

load_cope_approaches
content	Approaches for coping with load
children	scaling

scaling_up
content	Scaling Up
children	vertical
parents	scaling

scaling_out
content	Scaling Out
children	elastic, horizontal, magic_scaling_sauce, shared_nothing_arch
parents	scaling

scaling
content	Scaling
children	scaling_out, scaling_up
parents	load_cope_approaches

vertical
content	Vertical
parents	scaling_up

horizontal
content	Horizontal
parents	scaling_out

elastic
content	Elastic
children	good_for_unprepared_load, manual_simpler, auto_add_resources
parents	scaling_out

shared_nothing_arch
content	Shared nothing architecture
parents	scaling_out

auto_add_resources
content	auto-add resources on load increase
parents	elastic

good_for_unprepared_load
content	Good for unprepared laod
parents	elastic

manual_simpler
content	Manual is simpler
parents	elastic

magic_scaling_sauce
content	"Magic Scaling Sauce" (informal)
parents	scaling_out

maintainability
content	Maintainability
children	design_principles

legacy_system
content	Legacy System

design_principles
content	Design Principles
children	evolvability, operability, simplicity
parents	maintainability

operability
content	Operability
children	good_operability, operations_team
parents	design_principles

simplicity
content	Simplicity
parents	design_principles

evolvability
content	Evolvability
children	evolve_aka, making_change_easy, agile
parents	design_principles

evolve_aka
content	AKA
children	extensibility, modifiability, plasticicity
parents	evolvability

extensibility
content	Extensibility
parents	evolve_aka

modifiability
content	Modifiability
parents	evolve_aka

plasticicity
content	Plasticicity
parents	evolve_aka

operations_team
content	Operations Team
parents	operability

good_operability
content	Good operability
children	routine_tasks_easy (characteristic of)
parents	operability

routine_tasks_easy
content	Making routine tasks easy
parents	good_operability

managing_complexity
content	Managing Complexity
children	remove_accidental_complexity, big_ball_of_mud

big_ball_of_mud
content	Big Ball Of Mud
children	mired_in_complexity (definition)
parents	managing_complexity

mired_in_complexity
content	Software Mired in Complexity
parents	big_ball_of_mud

remove_accidental_complexity
content	Removing accidental complexity
children	not_inherent_to_problem (What defines "accidental complexity"?), abstraction (tool for)
parents	managing_complexity

not_inherent_to_problem
content	Not inherent to problem it solves
parents	remove_accidental_complexity

making_change_easy
content	Making Change Easy
parents	evolvability

agile
content	Agile
children	framework_adapting_change, test_driven_development, working_pattern, agility_data_system_level
parents	evolvability

abstraction
content	Abstraction
parents	remove_accidental_complexity

test_driven_development
content	Test-Driven Development (TDD)
parents	agile

agility_data_system_level
content	Agility on the data system levle
parents	agile

working_pattern
content	working pattern
parents	agile

framework_adapting_change
content	Framework for adapting to change
parents	agile