designing_data_intensive_applications/ch07

ch07

dz / designing_data_intensive_applications / ch07

Summary

Chapter 7: Transactions

Node Tree

Nodes

transactions
content Transactions
children ACID, abort_reduces_many_errors, group_rw_logical_unit (description), multi_object_transactions, no_partial_failure, safety_guarantees, simplify_programming_model (why transactions were created), transaction_isolation

group_rw_logical_unit
content Groups reads and writes into logical unit
parents transactions

simplify_programming_model
content Created to simplify programming model for apps accessing database
parents transactions

no_partial_failure
content No partial failure. Safe to retry.
parents transactions

safety_guarantees
content Safety Guarantees: databse handles certain potential error scenarios and concurrency issues
parents transactions

ACID
content ACID: Atomicity, Consistency, Isolation, Durability
children BASE (BASE is a less strict version of ACID), atomicity, best_effort, consistency, durability, implementations_not_equal, isolation, terminology_fault_tolerant_mechanisms
parents transactions

terminology_fault_tolerant_mechanisms
content An effort to establish precise terminology for fault-tolerant mechanisms in databases
parents ACID

implementations_not_equal
content Implementations not equal
children isolation_ambiguity
parents ACID

isolation_ambiguity
content "Isolation" ambiguity
children SQL_isolation_ambiguous
parents implementations_not_equal

BASE
content BASE: Basically available Soft-state Eventual consistency
parents ACID

atomicity
content Atomicity
children atomic, atomic_transaction, not_about_concurrency
parents ACID

atomic
content Atomic: something that cannot be broken down into smaller parts
parents atomicity

not_about_concurrency
content Not about concurrency in a DB context
parents atomicity

atomic_transaction
content Atomic Transaction: if writes in a transaction can't be completed due to fault, abort and discard writes
children abortability
parents atomicity

abortability
content "abortability"
parents atomic_transaction

consistency
content consistency
children depends_app_invariants, invariants
parents ACID

invariants
content Invariants: statements about data that must always be true
children depends_app_invariants
parents consistency

depends_app_invariants
content Depends on application's notion of invaraints
children property_of_app_not_db
parents consistency, invariants

property_of_app_not_db
content Property of application, not database
parents depends_app_invariants

isolation
content Isolation
children concurrency_problems_multiusers, concurrent_tasks_isolated, isolation_formalized_serializability, lost_update_problem, snapshot_isolation_oracle
parents ACID

concurrent_tasks_isolated
content Concurrently executing tasks are isolated from one another
parents isolation

concurrency_problems_multiusers
content Concurrency problems with multiple users
parents isolation

isolation_formalized_serializability
content Classical database textbooks formalize isolation as serializability
children perf_penalty, serializability
parents isolation

serializability
content Serializability: each transaction can pretend it is the only one running on the entire database
children two_phase_locking
parents isolation_formalized_serializability

perf_penalty
content Performance penalty: rarely used.
parents isolation_formalized_serializability

snapshot_isolation_oracle
content Snapshot Isolation (Oracle)
parents snapshot_isolation, isolation

durability
content Durability
children WAL, data_written_not_forgotten, nv_storage, repl_wait_copy_before_success
parents ACID

nv_storage
content Non-volatile storage
parents durability, single_node

WAL
content WAL (write-ahead-log)
parents durability, single_node, even_crash_hardware_fault

repl_wait_copy_before_success
content Replicated database: wait for data to copy to nodes before reporting success
parents durability

data_written_not_forgotten
content Promise: data written in successful transaction will not be forgotten
children even_crash_hardware_fault
parents durability

even_crash_hardware_fault
content Even with crashes or hardware faults
children WAL
parents data_written_not_forgotten

single_node
content Single-node
children WAL, nv_storage

multi_object_transactions
content multi-object transactions
children determine_which_ops_belong_to_transaction, ensures_foreign_keys_remain_valid, keeps_denorm_data_synced, single_object_writes
parents transactions

single_object_writes
content Single-object writes
children provides_atomicity_single_objects
parents multi_object_transactions

determine_which_ops_belong_to_transaction
content requires some way of determining which read/write operations belong to same transaction
parents multi_object_transactions

provides_atomicity_single_objects
content Usually provides atomicity for single objects
parents single_object_writes

ensures_foreign_keys_remain_valid
content In relational model, multi-object transactions ensure that foreign keys remain valid
parents multi_object_transactions

keeps_denorm_data_synced
content Helps to keep denormalized data in sync
parents multi_object_transactions

best_effort
content Best Effort: databse does as much as it can, won't undo changes it has already done if it runs into error
children leaderless_repl (useful in leaderless replication schemes (I think?))
parents ACID

leaderless_repl
content Leaderless replication
parents best_effort

transaction_isolation
content Transaction Isolation
children concurrency_bugs_difficult (reason they were created), read_committed, serializable_isolation
parents transactions

concurrency_bugs_difficult
content Created because concurrency bugs are hard to find by testing
parents transaction_isolation

serializable_isolation
content serializable isolation: database guarantees that transactions have same effect as if they ran serially
children SSI, actual_serial_execution, performance_cost, si_mantra, strongest_isolation_level
parents transaction_isolation

read_committed
content read committed
children no_dirty_reads, no_dirty_writes, read_committed_impl, read_skew
parents transaction_isolation

no_dirty_reads
content Reads: only see committed (AKA no dirty reads)
children no_dirty_writes (related), read_committed_impl
parents read_committed

no_dirty_writes
content Writes: only overwrite data that has been comitted (AKA no dirty writes)
children delay_2nd_write_until_1st_done, timing_dirty_write_LU_anomaly
parents no_dirty_reads, read_committed

performance_cost
content performance cost
children weaker_isolation_levels (alternative solution)
parents serializable_isolation

weaker_isolation_levels
content Weaker levels of isolation that protect aginst only some concurrency bugs
children subtle_bugs, throughput_and_response_worse_than_weak_isolation
parents performance_cost

subtle_bugs
content Can lead to subtle bugs
parents weaker_isolation_levels

delay_2nd_write_until_1st_done
content Usually delays second write until first write is either committed or aborted
children row_level_locks
parents no_dirty_writes

row_level_locks
content Row-level locks
children two_phase_locking (similar)
parents delay_2nd_write_until_1st_done

read_skew
content Read Skew / Non-repeatable Read
children reads_inconsistent_state, snapshot_isolation
parents read_committed

reads_inconsistent_state
content Anomaly where databse is read in an inconsistent state
parents read_skew

read_committed_impl
content For every object written, DB remembers old comitted and new value (set by transaction holding write lock). Old value is returned while transaction ongoing, new value returned when transaction committed
children impl_generalization_of_dirty_read (description of dirty read prevention mechanism)
parents no_dirty_reads, read_committed

snapshot_isolation
content Snapshot Isolation: each transaction reads from a consistent snapshot of the database
children SSI, repeatable_read, serializable_oracle (AKA), snapshot_isolation_oracle, visibility_rules
parents read_skew

visibility_rules
content Visibility Rules
parents snapshot_isolation

impl_generalization_of_dirty_read
content Implemenation is generalization of dirty read prevention mechanism
children MVCC
parents read_committed_impl

MVCC
content MVCC: multi-version concurrency control
children indices_multiversion_db, postgres_GC, stale_MVCC
parents impl_generalization_of_dirty_read

indices_multiversion_db
content Indices in multiversion DB
children append_only_b_trees, postgres_optimizations_avoid_index_updates
parents MVCC

postgres_GC
content PostgreSQL: use garbage collection on rows marked for deletion
parents MVCC

postgres_optimizations_avoid_index_updates
content PostgreSQL: optimizations for avoiding index updates if idfferent versions of same object can fit on same page
parents indices_multiversion_db

append_only_b_trees
content append-only / copy-on-write B-Trees (CouchDB, Datomic, LMDB)
parents indices_multiversion_db

serializable_oracle
content Serializable (Oracle)
parents snapshot_isolation

repeatable_read
content Repeatable Read (PostreSQL, MySQL)
children SQL_standard_repeatable_read
parents snapshot_isolation

SQL_standard_repeatable_read
content SQL Standard doesn't have concept of Snapshot Isolation (wasn't invented yet), Repeatable Read meets requirements of standard.
children SQL_isolation_ambiguous
parents repeatable_read

SQL_isolation_ambiguous
content Standard's definition of isolation ambiguous, and imprecise, different database implementations have different guarantees
parents isolation_ambiguity, SQL_standard_repeatable_read

lost_update_problem
content Lost Update Problem: if two concurrent transactions perform read-modify-write-cycle, one of the writes will be lost.
children atomic_updates (solution), auto_detecting_lost_updates, compare_and_set, explicit_locking (solution), generalization_lost_update, later_write_clobbers_earler, read_modify_write, replicated_lost_update, timing_dirty_write_LU_anomaly, write_skew
parents isolation

read_modify_write
content read-modify-write: read a value, modify the value, then write the new value back to the DB.
children allows_parallel_RMW
parents lost_update_problem

later_write_clobbers_earler
content Later write clobbers the earlier write
parents lost_update_problem

atomic_updates
content Atomic Updates
children cursor_stability
parents lost_update_problem

cursor_stability
content Cursor Stability: Take exclusive lock on object when it is being read so that no transaction can read it until the update has been applied.
parents atomic_updates

explicit_locking
content Explicit Locking
children app_explicitely_locks_objects
parents lost_update_problem

app_explicitely_locks_objects
content Application explicitely locks objects to updated
parents explicit_locking

compare_and_set
content Compare-and-set
children in_DBs_without_transactions, update_only_if_value_unchanged
parents lost_update_problem

auto_detecting_lost_updates
content auto-detecting lost updates: on detection, abort transaction and retry.
children allows_parallel_RMW
parents lost_update_problem

in_DBs_without_transactions
content in Databases that don't provide transactions
parents compare_and_set

allows_parallel_RMW
content Allows read-modify-write cycles to run in parallel
parents read_modify_write, auto_detecting_lost_updates

update_only_if_value_unchanged
content Update value only if value has not changed since it was last read.
parents compare_and_set

replicated_lost_update
content Replicated: allow concurent writes, conflicting versions (siblings), use application code or special data structures to resolve later.
parents lost_update_problem

write_skew
content Write Skew
children generalization_lost_update, phantom, premise, protects_against_write_skew_phantom, two_reads_same_object_some_updates
parents lost_update_problem

generalization_lost_update
content Generalization of lost update problem
parents write_skew, lost_update_problem

two_reads_same_object_some_updates
content Can happen if two transactions read same objects, then update some of those objects
children timing_dirty_write_LU_anomaly
parents write_skew

timing_dirty_write_LU_anomaly
content Depending on timing: it can either be a dirty write or a lost update anomaly
parents two_reads_same_object_some_updates, no_dirty_writes, lost_update_problem

phantom
content phantom
children materializing_conflicts, predicate_lock, protects_against_write_skew_phantom, write_changes_result_in_other_trans (description)
parents write_skew

write_changes_result_in_other_trans
content Write in one transactions changes the result in another transaction.
parents phantom

materializing_conflicts
content Materializing Conflicts
children phantom_to_lock_conflict (description)
parents phantom

phantom_to_lock_conflict
content Takes phantom and turns it into a lock conflict on concrete set of rows
parents materializing_conflicts

strongest_isolation_level
content Strongest Isolation Level
parents serializable_isolation

actual_serial_execution
content Actual Serial Execution
children every_trans_small_fast, interactive_multistatements_forbidden, limited_to_in_memory, low_write_throughput, stored_procedure
parents serializable_isolation

interactive_multistatements_forbidden
content Interactive multi-statement transactions not allowed in single-trhead serial transaction processing (for performance reasons)
parents actual_serial_execution

stored_procedure
content stored procedure: entire transaction code submitted to databse ahead of time
parents actual_serial_execution

every_trans_small_fast
content Every transaction must be small and fast
parents actual_serial_execution

limited_to_in_memory
content Limited to usecases where dataset fits in memory
parents actual_serial_execution

low_write_throughput
content Write throughput must be low enough for single CPU core, otherwise use partitioning
parents actual_serial_execution

two_phase_locking
content Two-Phase Locking (2PL)
children deadlock, lock_impl, many_reads_until_write_then_exclusive (description), not_two_phase_commit (not to be confused with), pessimistic_concurrency, phases (origin of name), predicate_lock, writers_block_readers_vice_versa
parents row_level_locks, serializability

not_two_phase_commit
content not to be confused with two-phase commit (2PC)
parents two_phase_locking

many_reads_until_write_then_exclusive
content Severual concurrent reads allowed until write occurs, then, exclusive access
parents two_phase_locking

writers_block_readers_vice_versa
content Writers block other readers and vice versa
parents si_mantra, two_phase_locking

si_mantra
content Serializable Isolation Mantra: readers nver block writers
children writers_block_readers_vice_versa (contrasts)
parents serializable_isolation

lock_impl
content Lock Implementation
children exclusive_mode, index_range_locking, lock_held_until_trans_end, phases, predicate_lock (similar to shared/exclusive lock), shared_mode
parents two_phase_locking

shared_mode
content Shared Mode
children transaction_reads_objects, upgrade_shared_to_exclusive
parents lock_impl

exclusive_mode
content Exclusive Mode
children transaction_writes_objects, upgrade_shared_to_exclusive
parents lock_impl

transaction_reads_objects
content Transaction reads objects
parents transaction_writes_objects, shared_mode

transaction_writes_objects
content Transaction writes objects
children transaction_reads_objects
parents exclusive_mode

upgrade_shared_to_exclusive
content When a transaction reads an object, then writes to it, it gets upgraded from shared mode to exclusive mode
parents exclusive_mode, shared_mode

lock_held_until_trans_end
content Acquired lock must be held until the end of the transaction
parents lock_impl

phases
content Phase 1: acquire lock. Phase 2: release lock
parents two_phase_locking, lock_impl

deadlock
content Deadlock: Transaction A stuck waiting for Transaction B to finish and release lock, and vice versa
children auto_deadlock_detection
parents two_phase_locking

auto_deadlock_detection
content Automatic Deadlock Detection
parents deadlock

throughput_and_response_worse_than_weak_isolation
content throughput and response times signficantly worse compared to weak isolation
children reduced_concurrency
parents weaker_isolation_levels

reduced_concurrency
content Reduced Concurrency
parents throughput_and_response_worse_than_weak_isolation

predicate_lock
content Predicate Lock: lock that bleongs to all objects that match search condition
children better_than_predicate
parents phantom, lock_impl, two_phase_locking

index_range_locking
content Index-range locking
children better_than_predicate, lock_table_as_safe_fallback, next_key_locking (AKA), simplify_predicate (description)
parents lock_impl

better_than_predicate
content Better than predicate locking
children simplify_predicate
parents predicate_lock, index_range_locking

protects_against_write_skew_phantom
content Protects aginst write skew and phantoms
parents phantom, write_skew

next_key_locking
content next-key locking
parents index_range_locking

simplify_predicate
content Simplify predicate by making it match greater set of objects
parents index_range_locking, better_than_predicate

lock_table_as_safe_fallback
content Safe (but expensive) Fallback: shared lock on entire table if not suitable range found
parents index_range_locking

SSI
content Serializable Snapshot Isolation (SSI)
children SSI_overview, abort_rate_affects_perf, full_serializability_small_penalty, granular_tracking_more_overhead, optimistic_concurrency
parents snapshot_isolation, serializable_isolation

full_serializability_small_penalty
content Provides full serializability and only a small performance penalty compared to snapshop isolation
parents SSI

pessimistic_concurrency
content Pessimistic Concurrency Control: if something goes wrong, better to wait until it's safe to do something
children mutex (similar), optimistic_concurrency (vs)
parents two_phase_locking

mutex
content Mutal Exclusion
parents pessimistic_concurrency

premise
content Premise: fact that was true at beginning of transaction
children db_detects_outdated_premise_abort
parents write_skew

optimistic_concurrency
content Optimistic Concurrency Control: keep going if things go wrong in the hope that everything willl turn out all right
children poor_perf_high_contention, usually_better_than_pessimistic
parents SSI, pessimistic_concurrency

poor_perf_high_contention
content Performs badly if there is high contention (many transactions trying to access same object)
parents optimistic_concurrency

usually_better_than_pessimistic
content Usually better than pessimistic concurrency control
parents optimistic_concurrency

SSI_overview
content Builds on top of snapshot isolation, detects serialization conflicts, determines which transactions to abort
children db_detects_outdated_premise_abort
parents SSI

db_detects_outdated_premise_abort
content Database must detect outdated premises and abort
children stale_MVCC
parents premise, SSI_overview

stale_MVCC
content Stale MVCC
parents db_detects_outdated_premise_abort, MVCC

abort_rate_affects_perf
content Abort rate significantly affects performance
parents SSI

granular_tracking_more_overhead
content More granular in tracking transactions yields more bookkeeping overhead
parents SSI

abort_reduces_many_errors
content Reduces large class of errors to do transaction abort
parents transactions