designing_data_intensive_applications/ch06

ch06

dz / designing_data_intensive_applications / ch06

Summary

Chapter 6: Partitioning

Node Tree

Nodes

partitions
content Partitions
children break_data_up, combined_with_replication, document_partitioned_indexes, key_range, large_datasets_high_throughput (when to use), main_reason_scalability, rebalancing, secondary_indexes, sharding (AKA), spread_data_load_evenly, term_partitioned_indexes

break_data_up
content Break data up
parents partitions

sharding
content Sharding
parents partitions

large_datasets_high_throughput
content For large datasets or very high throughput
parents partitions

main_reason_scalability
content Main reason: scalability
parents partitions

combined_with_replication
content Usually combined with replication
children leader_follower_part (example)
parents partitions

leader_follower_part
content Leader-follower: partition leader assignment to one node, followers to other
children node_more_than_one_part
parents combined_with_replication

spread_data_load_evenly
content Goal: spread data and query load evenly across nodes
children key_range_dist, partition_hash_keys, skewed_partitioning
parents partitions

key_range_dist
content Key-range distribution
children access_patterns_hotspots, compound_primary_key, keys_sorted_on_each_node, not_good_key_range, part_bounds_adapt_data, range_queries_inefficient
parents spread_data_load_evenly

part_bounds_adapt_data
content Partitions boundaries need to adapt to data
parents key_range_dist

keys_sorted_on_each_node
content Keep keys sorted on each node
parents key_range_dist

skewed_partitioning
content Skewed Partitioning
children hotspot_node, nodes_more_data_others
parents spread_data_load_evenly

nodes_more_data_others
content Some nodes have more data than others
parents skewed_partitioning

hotspot_node
content Hotspot node with high load
parents skewed_partitioning

partition_hash_keys
content Parition by hash of keys
children assign_part_hash_range, compound_primary_key, good_hash_func, range_queries_inefficient
parents spread_data_load_evenly

access_patterns_hotspots
content Certain access patterns can lead to hotspots
parents key_range_dist

node_more_than_one_part
content NOdes can store more than one partition
parents leader_follower_part

good_hash_func
content A good hash function takes skewed data and makes it uniformly distributed
children MD5 (example), fowler_noll_vo (example)
parents partition_hash_keys

MD5
content MD5 (cassandra, mongodb)
parents good_hash_func

fowler_noll_vo
content Fowler-Noll-Vo (Voldemort)
parents good_hash_func

assign_part_hash_range
content Assign each partition a range of hashes
children consistent_hashing
parents partition_hash_keys

consistent_hashing
content Consistent hashing: ranges chosen pseudorandomly
children hash_partitioning (AKA)
parents assign_part_hash_range
remarks rarely used

range_queries_inefficient
content Range Queries Inefficient
parents key_range_dist, partition_hash_keys

hash_partitioning
content Hash Partitioning
parents consistent_hashing

compound_primary_key
content Compound Primary Key: compromise between key-range hash-range distribution (Cassandra)
children several_columns
parents key_range_dist, partition_hash_keys

several_columns
content Several Columns: only first column hashed and used to determine parition. The rest are concatenated index for sorting data in SSTables
children good_for_many_to_one
parents compound_primary_key

good_for_many_to_one
content Good for many-to-one relationships
parents several_columns

secondary_indexes
content Secondary Indexes
children document_partitioning_index, doesnt_map_neatly_part
parents partitions

document_partitioning_index
content Document Partitioning Index
children document_partitioned_index (AKA), local_index (AKA)
parents secondary_indexes

local_index
content Local Index
children global_index (vs), scatter_gather
parents document_partitioning_index

global_index
content Global Index
children faster_reads_slow_complicated_writes, term_partitioned
parents local_index

doesnt_map_neatly_part
content Doesn't map neatly to partitions
parents secondary_indexes

scatter_gather
content Scatter / Gather
children prone_to_tail_latency, query_all_combine (description)
parents local_index

document_partitioned_index
content Document partitioned index
parents document_partitioning_index

query_all_combine
content Query all partitions, combine results
parents scatter_gather

prone_to_tail_latency
content Prone to tail latency amplification
parents scatter_gather

term_partitioned
content term-partitioned
children partitions_global_index, up_to_date_dist_trans
parents global_index

partitions_global_index
content Partitions global index
parents term_partitioned

up_to_date_dist_trans
content Up-to-date index requires distributed transactions
parents term_partitioned

faster_reads_slow_complicated_writes
content Reads faster, writers slower and more complicated
parents global_index

rebalancing
content Rebalancing
children dont_do_hash_mod_n, dynamic_partitioning, fixed_num_parts, move_load_from_node (description), part_growth_prop_data, partitioning_proportional_nodes
parents partitions

move_load_from_node
content Move load from one node in cluster to another
parents rebalancing

dont_do_hash_mod_n
content Don't do hash mod N
parents rebalancing

fixed_num_parts
content Fixed number of partitions
children account_mismatched_hardware, dynamic_partitioning (vs), more_parts_than_nodes, not_good_key_range, partition_number
parents rebalancing

more_parts_than_nodes
content More partitions than nodes, move/steal partitions when new noded added
parents fixed_num_parts

account_mismatched_hardware
content Can even account for mismatched hardware
parents fixed_num_parts

partition_number
content Partition Number: too large and rebalancing is expensive, too small and there's too much overhead.
parents fixed_num_parts

not_good_key_range
content Not good for key-range partitioning
children dynamic_partitioning (solution)
parents fixed_num_parts, key_range_dist

dynamic_partitioning
content Dynamic Partitioning
children empty_db_single_part (caveat), merge_part_below_thresh, partition_number_adaptive, split_part_exceeds_size, suitable_key_range_hash_part
parents not_good_key_range, fixed_num_parts, rebalancing

split_part_exceeds_size
content Split partition that exceeds certain size
children merge_part_below_thresh (related)
parents dynamic_partitioning

merge_part_below_thresh
content Merge partition when it sinks below certain threshold
parents split_part_exceeds_size, dynamic_partitioning

partition_number_adaptive
content Number of partitions adapts to the total data volume
parents dynamic_partitioning

suitable_key_range_hash_part
content Suitable for key-range and hash-partioned data
parents dynamic_partitioning

empty_db_single_part
content An initialized empty database starts with a single partition. All writes initially processed by single node while other nodes are idle.
parents dynamic_partitioning

partitioning_proportional_nodes
content Partitioning proportional to nodes
children fixed_part_num_per_node
parents rebalancing

fixed_part_num_per_node
content Fixed number of partitions per node
parents partitioning_proportional_nodes

part_growth_prop_data
content Partition size grows proportional to data size
children part_shrinks_when_new_node
parents rebalancing

part_shrinks_when_new_node
content Partition size shrink when new node is added
parents part_growth_prop_data

MPP
content Massively parallel processing (MPP)
children parallel_query_exec

parallel_query_exec
content Parallel query execution
parents MPP

key_range
content Key Range
children split_if_part_too_long (description)
parents partitions

split_if_part_too_long
content Split into two subranges if partition gets too long
parents key_range

document_partitioned_indexes
content document partitioned indexes
children global, secondary_indexes_stored_same_part, term_partitioned_indexes (related)
parents partitions

global
content global
parents document_partitioned_indexes

term_partitioned_indexes
content Term partitioned indexes
children local, partitioned_separately
parents partitions, document_partitioned_indexes

local
content Local
parents term_partitioned_indexes

partitioned_separately
content Partitioned separately
parents secondary_indexes_stored_same_part, term_partitioned_indexes

secondary_indexes_stored_same_part
content Secondary indexes stored on same partition as primary key/value
children partitioned_separately (vs)
parents document_partitioned_indexes