distributed_systems_MIT/lec04

lec04

dz / distributed_systems_MIT / lec04

Node Tree

Nodes

fault_tolerance
content Fault Tolerance
children replication (Tool Used For Fault Tolerance), vmware_ft

replication
content Replication
children limits_to, replication_schemes, worth_it, expected_failures
parents fault_tolerance

expected_failures
content Expected Failures To Address
children fail_stop_faults
parents replication

fail_stop_faults
content Fail Stop Faults: Stops Computing Entirely
children hardware_errors
parents expected_failures

limits_to
content Limits To Replication (Not Covered)
children software_bugs, correlated_failures
parents replication

software_bugs
content Bugs in Software
parents limits_to

vmware_ft
content VMWare FT. This lecture studies this particular replication design.
children full_state_detailed (This is the approach that VMWare FT uses, which makes,it unique.), output, primary_fails, timer_exact, unicore_processor, vmm
parents fault_tolerance

hardware_errors
content Hardware errors can be turned into fault errors sometimes. The advantage of this is that these errors can be detectable.
parents fail_stop_faults

correlated_failures
content Correlated failures include hardware defects (such as from defective batch of servers from a single company), and natural disasters like earthquakes.
children physical_separation
parents limits_to

worth_it
content Is replication worth it?
parents replication

depends
content Depends on value of a reliable service

physical_separation
content Physical separtion (different countries)
parents correlated_failures

state_transfer
content State Transfer
children smaller_operations (more favorable than state transfer), whole_state
parents replication_schemes

replication_schemes
content Replication Schemes
children replicated_state_machine, state_transfer
parents replication

replicated_state_machine
content Replicated State Machine
children internal_deterministic, smaller_operations (This is a "pro" for using RSMs over), designing_rsm
parents replication_schemes

whole_state
content Sends whole state of primary
children just_send_external (Sending external events typically means sending less)
parents state_transfer

internal_deterministic
content Works on the assumption that most internal operations of a CPU are deterministic
children unicore_processor (single-core instructions are determinstic)
parents replicated_state_machine

just_send_external
content Just send external events (input events, packets, etc)
children nondeterministic_events (External events are the non-deterministic events)
parents whole_state

smaller_operations
content RSMs tend to have smaller operations (compared to state transfer), tends to be more favorable
children ops_more_complex (Potential downside of RSMs)
parents state_transfer, replicated_state_machine

ops_more_complex
content Operations in RSMs tend to be more complex
parents smaller_operations

unicore_processor
content VMWare FT replication works on unicore processors
children multicore_nondeterministic (multicore unable to be used with this replication scheme)
parents internal_deterministic, vmware_ft

multicore_nondeterministic
content multicore processors can't be used because the way instructions are interleaved makes them non-deterministic
children multicore_parallelism
parents unicore_processor
flashcard (front) Why can't multicore processors be used in the VMWare FT Replication scheme?
flashcard (back) The way multicore processors interleave instructions makes them non-deterministic and therefore unsuitable for the VMware FT replication scheme.

level_of_replication
content What level of replication should be used?
children full_state_detailed
parents designing_rsm

designing_rsm
content Designing a Replicated State Machine (RSM)
children how_close_is_sync, level_of_replication, new_replica_expensive
parents replicated_state_machine
flashcard (front) What does RSM stand for?
flashcard (back) Replicated State Machine.

how_close_is_sync
content How close is synchronization? (between primary/backup)
children sync_ideal
parents designing_rsm

sync_ideal
content Ideal Synchronization: if primary fails, switch over to backup with no anomalies.
parents how_close_is_sync
remarks this never actually happens in practice, anomalies do occur

new_replica_expensive
content Creation of a new replica is expensive
children full_state_detailed
parents designing_rsm

full_state_detailed
content Copying full State of machine (registers, memory) is very detailed
children application_level (more efficient than machine-level replication)
parents level_of_replication, new_replica_expensive, vmware_ft

application_level
content Most replication schemes are application-level
children replication_application
parents full_state_detailed

replication_application
content Replication needs to be a part of the application in order to work.
children existing_software (Existing software runs on top of machine and can work,without modification or any knowledge of replication.)
parents application_level

existing_software
content Existing software will work as-is using machine-level replication.
parents replication_application

multicore_parallelism
content Multicore Parallelism is not covered
parents multicore_nondeterministic, nondeterministic_events

nondeterministic_events
content Examples of non-deterministic events
children inputs, multicore_parallelism
parents just_send_external

inputs
content Inputs are the most common non-deterministic event
children network_packets
parents nondeterministic_events

network_packets
content Inputs in this scope are just network packets
children data_interrupt
parents inputs

data_interrupt
content When a packet arrives, the data in the packet, and the interrupt type is stored.
children timing_interrupt
parents network_packets

timing_interrupt
content The timing of the interrupt (where it is in the instruction set) must be identical.
parents data_interrupt

vmm
content Virtual Machine Monitor
children packet_sends_vm_backup
parents vmware_ft

packet_sends_vm_backup
content Network packets, sends to the VM, then sends a version of the packet to the backup
children primary_outputs_only
parents vmm

primary_outputs_only
content Both primary and backup see inputs, primary is the only one that outputs.
parents packet_sends_vm_backup

logging_channel
content Logging Channel: stream of events.
children log_entry_format, only_weird_instructions, arriving_packets
remarks Context: sending "Log events on the log channel"

primary_fails
content What if the primary fails?
children backup_stops_logs
parents vmware_ft

backup_stops_logs
content Indicator that primary fails is if the backup stops getting logs from the primary.
children backup_goes_live
parents primary_fails
remarks Apparently logs get sent quite frequently to the backup (many times a second). Some kind of "heartbeat" or timing interrupt? I forget the exact terminology

backup_goes_live
content The Backup Goes "Live"
children vm_allows_backup_to_run
parents backup_stops_logs

vm_allows_backup_to_run
content The VM allows the backup to run. The backup then stops discarding output.
parents backup_goes_live

only_weird_instructions
content Only "weird" instructions get sent to the log channel
parents logging_channel

log_entry_format
content Format of a log entry
children interrupt_type, log_entry_data
parents logging_channel
remarks They don't explicitely say what the format of a log entry is in the paper.

interrupt_type
content Interrupt Type
parents log_entry_format
remarks I just wrote "type", but I'm assuming it's interrupt type

log_entry_data
content Data (from network packet)
parents log_entry_format

timer_exact
content Assumes VM has timer in exactly the same place for both the Primary and Backup
children physical_timer_to_guest, backup_gets_ahead
parents vmware_ft

physical_timer_to_guest
content Physical timer interrupts are sent to guest
parents timer_exact

arriving_packets
content Arriving Packets
children NICS_DMA
parents logging_channel

NICS_DMA
content Some NICS use DMA (direct memory access) in their implementation.
children primary_no_DMA
parents arriving_packets

primary_no_DMA
content Primary cannot directly access NIC and the DMA directly
children private_mem
parents NICS_DMA

private_mem
content Events from NIC are DMA'd into private memory in VM, then they are copied over to the primary
children bounce_buffer ("Bounce Buffer" is the term for what this does)
parents primary_no_DMA

bounce_buffer
content Bounce Buffer
parents private_mem

backup_gets_ahead
content What if backup gets ahead of primary execution? This can't ever happen.
children event_buffer_nonempty (Event buffer is used to prevent backup from getting ahead)
parents timer_exact

event_buffer_nonempty
content Event buffer: VM only executes instructions if non-empty
parents backup_gets_ahead

output
content Handling output events
children network_packets_only, awkward_failures
parents vmware_ft

network_packets_only
content In this context, the only thing being output are network packets
parents output

awkward_failures
content What are the kinds of awkward failures that could happen?
children network_split_brain (example of failure), output_rules, test_and_set (Prevantative Solution)
parents output

output_rules
content Output Rules Preventative Measures against certain kinds of failures
children output_waits_for_backup (This prevents issues related to backup not receiving,network packets over log channel)
parents awkward_failures

output_waits_for_backup
content Output can't produce any output until backup receives all previous events to this point in time.
parents output_rules

test_and_set
content Test And Set: an outside authority that deices which machine (primary/backup) can be "live"
children acts_like_lock, network_split_brain ("Test and Set" server used to solve this)
parents awkward_failures

network_split_brain
content Network Issues can cause split brain
parents awkward_failures, test_and_set

acts_like_lock
content Test/Set server acts like a lock. The primary/secondary send requests to this server to get write permission, which in turn set a flag on the Test/Set server.
parents test_and_set