lec14
dz / distributed_systems_MIT / lec14Summary
Farm, OCC
Node Tree
-
farm
- vs_spanner
- OCC
- RDMA_nics
- bottlenecks
- commit_protocol
- farm_api
- forced_occ
- high_performance
- network_cpu_bottleneck
- research_prototype
- same_datacenter
- server_memory_layout
- sharded_primary_backup_pairs
Nodes
RDMA_nics | |
content | RDMA nics |
children | RDMA, clever_network_interface_card, firmware_only, forced_occ (RDMA NICs are the reason for using OCC), sequence_protocol |
parents | farm |
vs_spanner | |
content | vs spanner |
children | both_2pc, geographic_in_repl, good_performance, ro_trans_sync_time |
parents | farm |
both_2pc | |
content | Both use two-phase commit |
parents | vs_spanner |
geographic_in_repl | |
content | Spanner is geographic in replication |
parents | vs_spanner |
ro_trans_sync_time | |
content | Uses read-only transactions using synchronized time |
parents | vs_spanner |
bottlenecks | |
content | Bottlenecks |
children | speed_of_light, cpu_time |
parents | farm |
same_datacenter | |
content | same datacenter |
parents | farm |
research_prototype | |
content | research prototype |
parents | farm |
forced_occ | |
content | Forced to use OCC (optimistic concurrency control) |
parents | RDMA_nics, farm |
good_performance | |
content | good performance |
parents | vs_spanner |
speed_of_light | |
content | Speed of Light |
parents | bottlenecks |
cpu_time | |
content | CPU time |
parents | bottlenecks |
sharded_primary_backup_pairs | |
content | Sharded on primary backup pairs |
parents | farm |
high_performance | |
content | Ways farm gets high performance |
children | transaction_code, NVRAM, RDMA, data_fits_RAM, kernel_bypass, sharding (main way farm gets high performance) |
parents | farm |
sharding | |
content | Sharding |
parents | high_performance |
data_fits_RAM | |
content | Data fits in RAM |
children | NVRAM |
parents | high_performance |
remarks | much faster than disk |
transaction_code | |
content | Transaction code |
parents | high_performance |
RDMA | |
content | RDMA |
children | LAN_only, clever_network_interface_card, one_sided_RDMA, remote_direct_memory_access (Acronym) |
parents | RDMA_nics, high_performance |
kernel_bypass | |
content | Kernal Bypass |
children | skip_stack, DMA_in_app_memory, app_code_acces_nic_without_kernel (description) |
parents | high_performance |
clever_network_interface_card | |
content | Clever network interface card (NIC) |
parents | RDMA_nics, RDMA |
NVRAM | |
content | Non-volatile RAM (NVRAM) |
children | multiple_servers_write_ram_enough, only_works_for_power_fail |
parents | data_fits_RAM, high_performance |
app_code_acces_nic_without_kernel | |
content | Applicaiton code can directly access network card without kernel |
parents | kernel_bypass |
multiple_servers_write_ram_enough | |
content | Is it enough to simply write to the RAM of multiple servers? |
children | site_wide_power_failure (No, a sitewide power failure will wipe it all out) |
parents | NVRAM |
site_wide_power_failure | |
content | A site-wide power failure will lose data |
children | battery_system (prevantative measure against power failures) |
parents | multiple_servers_write_ram_enough |
battery_system | |
content | Battery System |
children | alert_system |
parents | site_wide_power_failure |
alert_system | |
content | Alert System |
children | server_saves_to_disk (on alert) |
parents | battery_system |
server_saves_to_disk | |
content | Server saves RAM to disk |
parents | alert_system |
only_works_for_power_fail | |
content | Only works for power failure crash |
parents | NVRAM |
network_cpu_bottleneck | |
content | Network CPU bottlenecks |
children | classic_network_stack_too_slow |
parents | farm |
classic_network_stack_too_slow | |
content | Classic Network Stack too slow for RPCs. |
children | classic_network_stack_top_down |
parents | network_cpu_bottleneck |
classic_network_stack_top_down | |
content | Classic Network stack order: app, buffer, TCP, NIC driver, DMA, NIC |
children | skip_stack |
parents | classic_network_stack_too_slow |
skip_stack | |
content | Skip stack |
parents | classic_network_stack_top_down, kernel_bypass |
DMA_in_app_memory | |
content | DMA is directly in application memory |
children | app_takes_tcp_responsibilities |
parents | kernel_bypass |
app_takes_tcp_responsibilities | |
content | Because it skips TCP, application takes on some TCP responsibilities |
children | sequence_protocol (NIC handles this too) |
parents | DMA_in_app_memory |
remote_direct_memory_access | |
content | Remote Direct Memory Access |
parents | RDMA |
firmware_only | |
content | Firmware only: computer OS doesn't know about read/writes |
parents | RDMA_nics |
sequence_protocol | |
content | Run their own reliable sequence protocol, similar to TCP |
parents | RDMA_nics, app_takes_tcp_responsibilities |
LAN_only | |
content | LAN only |
parents | RDMA |
one_sided_RDMA | |
content | One-sided RDMA |
children | transactions_with_only_one_sided, execute_one_sided_read, one_app_RDMA_another_RDMA (description) |
parents | RDMA |
one_app_RDMA_another_RDMA | |
content | One app uses RDMA to read/write RDMA of another app |
children | append_to_log_op (the typical operation for one-sided RDMA in Farm) |
parents | one_sided_RDMA |
append_to_log_op | |
content | appends to log |
parents | one_app_RDMA_another_RDMA |
transactions_with_only_one_sided | |
content | Can you implemented transactions with only one-sided RDMA? |
children | farm_suggests_no (still a question to think about though) |
parents | one_sided_RDMA |
farm_suggests_no | |
content | Farm would suggest the answer would be "no" |
parents | transactions_with_only_one_sided |
OCC | |
content | Optimistic Concurrency Control (OCC) |
children | version_lockbits_enforce_serializability, buffer_writes_locally, check_later_if_reads_okay, commit_then_validate |
parents | farm |
buffer_writes_locally | |
content | Buffer Writes Locally |
parents | OCC |
check_later_if_reads_okay | |
content | Check later if reads are okay |
parents | OCC |
commit_then_validate | |
content | commit then validate |
children | validation, abort_on_conflicts |
parents | OCC |
abort_on_conflicts | |
content | Abort on conflicts |
children | exponential_backup |
parents | commit_then_validate |
validation | |
content | Validation |
children | optimize_for_reads, refetch_object_header |
parents | commit_then_validate |
farm_api | |
content | API |
children | txcommit, txcreate, txread, txwrite, OID |
parents | farm |
txcreate | |
content | txCreate() |
children | creates_transaction |
parents | farm_api |
txread | |
content | txRead() |
children | OID (input argument) |
parents | farm_api |
OID | |
content | Object ID (OID) |
children | compound_identifier |
parents | txread, txwrite, farm_api |
creates_transaction | |
content | Creates Transaction |
parents | txcreate |
txwrite | |
content | txWrite() |
children | OID (input argument) |
parents | farm_api |
exponential_backup | |
content | Exponential backup maybe used? |
parents | abort_on_conflicts |
compound_identifier | |
content | Compound Identifier |
children | address, region_num |
parents | OID |
region_num | |
content | Region Number |
parents | compound_identifier |
address | |
content | Address |
parents | compound_identifier |
server_memory_layout | |
content | Server Memory Layout |
children | logs_for_each_server, pair_msg_queues, region |
parents | farm |
region | |
content | Region |
children | versioned_objects |
parents | server_memory_layout |
versioned_objects | |
content | Versioned Objects |
children | version_num, lock_flag |
parents | region |
version_num | |
content | version number |
parents | versioned_objects |
lock_flag | |
content | Lock flag |
parents | versioned_objects |
pair_msg_queues | |
content | Pair of Message Queues |
parents | server_memory_layout |
logs_for_each_server | |
content | Logs, one for each of the other servers |
parents | server_memory_layout |
commit_protocol | |
content | Commit Protocol |
children | execute_phase |
parents | farm |
execute_phase | |
content | Execute Phase |
children | txcommit_call, reads_everything_needed |
parents | commit_protocol |
reads_everything_needed | |
content | Reads everything it needs |
parents | execute_phase |
txcommit_call | |
content | txcommit call |
children | commit_phase (happens when all yes) |
parents | txcommit, execute_phase |
txcommit | |
content | txCommit |
children | txcommit_call |
parents | farm_api |
commit_phase | |
content | commit phase |
children | trans_coord_all_yes, lock_phase |
parents | txcommit_call |
lock_phase | |
content | Lock Phase |
children | trans_coord_all_yes, send_object_id |
parents | commit_phase |
send_object_id | |
content | client sends each primary server identity of udpated object |
children | append_to_log |
parents | lock_phase |
trans_coord_all_yes | |
content | Tranasaction coordinator notifies primary servers "all yes" |
children | append_to_prim |
parents | commit_phase, lock_phase |
append_to_log | |
content | Append to log |
children | prim_active_log_process |
parents | send_object_id |
prim_active_log_process | |
content | Primaries actively process new logs, and send yes/no vote |
children | version_changed, is_object_already_locked |
parents | append_to_log |
is_object_already_locked | |
content | is object already locked? |
parents | prim_active_log_process |
version_changed | |
content | has the version number changed? |
children | atomic_compare_and_swap |
parents | prim_active_log_process |
atomic_compare_and_swap | |
content | Atomic compare_and_swap |
children | multithread_race_transactions (rationale for atomic operation) |
parents | version_changed |
multithread_race_transactions | |
content | Multithreading can cause races between transactions |
parents | atomic_compare_and_swap |
append_to_prim | |
content | append to primary log |
children | commit_prim_record |
parents | trans_coord_all_yes |
commit_prim_record | |
content | commit primary record |
children | update_object_version_clear_lock_bit |
parents | append_to_prim |
update_object_version_clear_lock_bit | |
content | Update object and version number, clear lock bit |
parents | commit_prim_record |
version_lockbits_enforce_serializability | |
content | Version numbering and lock bits enforce serializability in OCC |
parents | OCC |
optimize_for_reads | |
content | Optimization to treat objects read by transactions, not written |
children | straight_ro_transaction |
parents | validation |
refetch_object_header | |
content | Refetch object header |
children | check_versions_locks |
parents | validation |
check_versions_locks | |
content | Checks for version changes since start and if the lock bit is set |
parents | refetch_object_header |
straight_ro_transaction | |
content | Straight read-only transaction |
children | execute_one_sided_read, ro_valid_optimizer |
parents | optimize_for_reads |
execute_one_sided_read | |
content | Execute with fast one-sided read |
parents | straight_ro_transaction, one_sided_RDMA |
ro_valid_optimizer | |
content | read-only validation optimizer |
parents | straight_ro_transaction |