distributed_systems_MIT/lec14

lec14

dz / distributed_systems_MIT / lec14

Summary

Farm, OCC

Node Tree

Nodes

farm
content Farm
children vs_spanner, OCC, RDMA_nics, bottlenecks, commit_protocol, farm_api, forced_occ, high_performance, network_cpu_bottleneck, research_prototype, same_datacenter, server_memory_layout, sharded_primary_backup_pairs

RDMA_nics
content RDMA nics
children RDMA, clever_network_interface_card, firmware_only, forced_occ (RDMA NICs are the reason for using OCC), sequence_protocol
parents farm

vs_spanner
content vs spanner
children both_2pc, geographic_in_repl, good_performance, ro_trans_sync_time
parents farm

both_2pc
content Both use two-phase commit
parents vs_spanner

geographic_in_repl
content Spanner is geographic in replication
parents vs_spanner

ro_trans_sync_time
content Uses read-only transactions using synchronized time
parents vs_spanner

bottlenecks
content Bottlenecks
children speed_of_light, cpu_time
parents farm

same_datacenter
content same datacenter
parents farm

research_prototype
content research prototype
parents farm

forced_occ
content Forced to use OCC (optimistic concurrency control)
parents RDMA_nics, farm

good_performance
content good performance
parents vs_spanner

speed_of_light
content Speed of Light
parents bottlenecks

cpu_time
content CPU time
parents bottlenecks

sharded_primary_backup_pairs
content Sharded on primary backup pairs
parents farm

high_performance
content Ways farm gets high performance
children transaction_code, NVRAM, RDMA, data_fits_RAM, kernel_bypass, sharding (main way farm gets high performance)
parents farm

sharding
content Sharding
parents high_performance

data_fits_RAM
content Data fits in RAM
children NVRAM
parents high_performance
remarks much faster than disk

transaction_code
content Transaction code
parents high_performance

RDMA
content RDMA
children LAN_only, clever_network_interface_card, one_sided_RDMA, remote_direct_memory_access (Acronym)
parents RDMA_nics, high_performance

kernel_bypass
content Kernal Bypass
children skip_stack, DMA_in_app_memory, app_code_acces_nic_without_kernel (description)
parents high_performance

clever_network_interface_card
content Clever network interface card (NIC)
parents RDMA_nics, RDMA

NVRAM
content Non-volatile RAM (NVRAM)
children multiple_servers_write_ram_enough, only_works_for_power_fail
parents data_fits_RAM, high_performance

app_code_acces_nic_without_kernel
content Applicaiton code can directly access network card without kernel
parents kernel_bypass

multiple_servers_write_ram_enough
content Is it enough to simply write to the RAM of multiple servers?
children site_wide_power_failure (No, a sitewide power failure will wipe it all out)
parents NVRAM

site_wide_power_failure
content A site-wide power failure will lose data
children battery_system (prevantative measure against power failures)
parents multiple_servers_write_ram_enough

battery_system
content Battery System
children alert_system
parents site_wide_power_failure

alert_system
content Alert System
children server_saves_to_disk (on alert)
parents battery_system

server_saves_to_disk
content Server saves RAM to disk
parents alert_system

only_works_for_power_fail
content Only works for power failure crash
parents NVRAM

network_cpu_bottleneck
content Network CPU bottlenecks
children classic_network_stack_too_slow
parents farm

classic_network_stack_too_slow
content Classic Network Stack too slow for RPCs.
children classic_network_stack_top_down
parents network_cpu_bottleneck

classic_network_stack_top_down
content Classic Network stack order: app, buffer, TCP, NIC driver, DMA, NIC
children skip_stack
parents classic_network_stack_too_slow

skip_stack
content Skip stack
parents classic_network_stack_top_down, kernel_bypass

DMA_in_app_memory
content DMA is directly in application memory
children app_takes_tcp_responsibilities
parents kernel_bypass

app_takes_tcp_responsibilities
content Because it skips TCP, application takes on some TCP responsibilities
children sequence_protocol (NIC handles this too)
parents DMA_in_app_memory

remote_direct_memory_access
content Remote Direct Memory Access
parents RDMA

firmware_only
content Firmware only: computer OS doesn't know about read/writes
parents RDMA_nics

sequence_protocol
content Run their own reliable sequence protocol, similar to TCP
parents RDMA_nics, app_takes_tcp_responsibilities

LAN_only
content LAN only
parents RDMA

one_sided_RDMA
content One-sided RDMA
children transactions_with_only_one_sided, execute_one_sided_read, one_app_RDMA_another_RDMA (description)
parents RDMA

one_app_RDMA_another_RDMA
content One app uses RDMA to read/write RDMA of another app
children append_to_log_op (the typical operation for one-sided RDMA in Farm)
parents one_sided_RDMA

append_to_log_op
content appends to log
parents one_app_RDMA_another_RDMA

transactions_with_only_one_sided
content Can you implemented transactions with only one-sided RDMA?
children farm_suggests_no (still a question to think about though)
parents one_sided_RDMA

farm_suggests_no
content Farm would suggest the answer would be "no"
parents transactions_with_only_one_sided

OCC
content Optimistic Concurrency Control (OCC)
children version_lockbits_enforce_serializability, buffer_writes_locally, check_later_if_reads_okay, commit_then_validate
parents farm

buffer_writes_locally
content Buffer Writes Locally
parents OCC

check_later_if_reads_okay
content Check later if reads are okay
parents OCC

commit_then_validate
content commit then validate
children validation, abort_on_conflicts
parents OCC

abort_on_conflicts
content Abort on conflicts
children exponential_backup
parents commit_then_validate

validation
content Validation
children optimize_for_reads, refetch_object_header
parents commit_then_validate

farm_api
content API
children txcommit, txcreate, txread, txwrite, OID
parents farm

txcreate
content txCreate()
children creates_transaction
parents farm_api

txread
content txRead()
children OID (input argument)
parents farm_api

OID
content Object ID (OID)
children compound_identifier
parents txread, txwrite, farm_api

creates_transaction
content Creates Transaction
parents txcreate

txwrite
content txWrite()
children OID (input argument)
parents farm_api

exponential_backup
content Exponential backup maybe used?
parents abort_on_conflicts

compound_identifier
content Compound Identifier
children address, region_num
parents OID

region_num
content Region Number
parents compound_identifier

address
content Address
parents compound_identifier

server_memory_layout
content Server Memory Layout
children logs_for_each_server, pair_msg_queues, region
parents farm

region
content Region
children versioned_objects
parents server_memory_layout

versioned_objects
content Versioned Objects
children version_num, lock_flag
parents region

version_num
content version number
parents versioned_objects

lock_flag
content Lock flag
parents versioned_objects

pair_msg_queues
content Pair of Message Queues
parents server_memory_layout

logs_for_each_server
content Logs, one for each of the other servers
parents server_memory_layout

commit_protocol
content Commit Protocol
children execute_phase
parents farm

execute_phase
content Execute Phase
children txcommit_call, reads_everything_needed
parents commit_protocol

reads_everything_needed
content Reads everything it needs
parents execute_phase

txcommit_call
content txcommit call
children commit_phase (happens when all yes)
parents txcommit, execute_phase

txcommit
content txCommit
children txcommit_call
parents farm_api

commit_phase
content commit phase
children trans_coord_all_yes, lock_phase
parents txcommit_call

lock_phase
content Lock Phase
children trans_coord_all_yes, send_object_id
parents commit_phase

send_object_id
content client sends each primary server identity of udpated object
children append_to_log
parents lock_phase

trans_coord_all_yes
content Tranasaction coordinator notifies primary servers "all yes"
children append_to_prim
parents commit_phase, lock_phase

append_to_log
content Append to log
children prim_active_log_process
parents send_object_id

prim_active_log_process
content Primaries actively process new logs, and send yes/no vote
children version_changed, is_object_already_locked
parents append_to_log

is_object_already_locked
content is object already locked?
parents prim_active_log_process

version_changed
content has the version number changed?
children atomic_compare_and_swap
parents prim_active_log_process

atomic_compare_and_swap
content Atomic compare_and_swap
children multithread_race_transactions (rationale for atomic operation)
parents version_changed

multithread_race_transactions
content Multithreading can cause races between transactions
parents atomic_compare_and_swap

append_to_prim
content append to primary log
children commit_prim_record
parents trans_coord_all_yes

commit_prim_record
content commit primary record
children update_object_version_clear_lock_bit
parents append_to_prim

update_object_version_clear_lock_bit
content Update object and version number, clear lock bit
parents commit_prim_record

version_lockbits_enforce_serializability
content Version numbering and lock bits enforce serializability in OCC
parents OCC

optimize_for_reads
content Optimization to treat objects read by transactions, not written
children straight_ro_transaction
parents validation

refetch_object_header
content Refetch object header
children check_versions_locks
parents validation

check_versions_locks
content Checks for version changes since start and if the lock bit is set
parents refetch_object_header

straight_ro_transaction
content Straight read-only transaction
children execute_one_sided_read, ro_valid_optimizer
parents optimize_for_reads

execute_one_sided_read
content Execute with fast one-sided read
parents straight_ro_transaction, one_sided_RDMA

ro_valid_optimizer
content read-only validation optimizer
parents straight_ro_transaction