lec04
dz / distributed_systems_MIT / lec04Node Tree
- fault_tolerance
- logging_channel
- depends
Nodes
fault_tolerance | |
content | Fault Tolerance |
children | replication (Tool Used For Fault Tolerance), vmware_ft |
replication | |
content | Replication |
children | limits_to, replication_schemes, worth_it, expected_failures |
parents | fault_tolerance |
expected_failures | |
content | Expected Failures To Address |
children | fail_stop_faults |
parents | replication |
fail_stop_faults | |
content | Fail Stop Faults: Stops Computing Entirely |
children | hardware_errors |
parents | expected_failures |
limits_to | |
content | Limits To Replication (Not Covered) |
children | software_bugs, correlated_failures |
parents | replication |
software_bugs | |
content | Bugs in Software |
parents | limits_to |
vmware_ft | |
content | VMWare FT. This lecture studies this particular replication design. |
children | full_state_detailed (This is the approach that VMWare FT uses, which makes,it unique.), output, primary_fails, timer_exact, unicore_processor, vmm |
parents | fault_tolerance |
hardware_errors | |
content | Hardware errors can be turned into fault errors sometimes. The advantage of this is that these errors can be detectable. |
parents | fail_stop_faults |
correlated_failures | |
content | Correlated failures include hardware defects (such as from defective batch of servers from a single company), and natural disasters like earthquakes. |
children | physical_separation |
parents | limits_to |
worth_it | |
content | Is replication worth it? |
parents | replication |
depends | |
content | Depends on value of a reliable service |
physical_separation | |
content | Physical separtion (different countries) |
parents | correlated_failures |
state_transfer | |
content | State Transfer |
children | smaller_operations (more favorable than state transfer), whole_state |
parents | replication_schemes |
replication_schemes | |
content | Replication Schemes |
children | replicated_state_machine, state_transfer |
parents | replication |
replicated_state_machine | |
content | Replicated State Machine |
children | internal_deterministic, smaller_operations (This is a "pro" for using RSMs over), designing_rsm |
parents | replication_schemes |
whole_state | |
content | Sends whole state of primary |
children | just_send_external (Sending external events typically means sending less) |
parents | state_transfer |
internal_deterministic | |
content | Works on the assumption that most internal operations of a CPU are deterministic |
children | unicore_processor (single-core instructions are determinstic) |
parents | replicated_state_machine |
just_send_external | |
content | Just send external events (input events, packets, etc) |
children | nondeterministic_events (External events are the non-deterministic events) |
parents | whole_state |
smaller_operations | |
content | RSMs tend to have smaller operations (compared to state transfer), tends to be more favorable |
children | ops_more_complex (Potential downside of RSMs) |
parents | state_transfer, replicated_state_machine |
ops_more_complex | |
content | Operations in RSMs tend to be more complex |
parents | smaller_operations |
unicore_processor | |
content | VMWare FT replication works on unicore processors |
children | multicore_nondeterministic (multicore unable to be used with this replication scheme) |
parents | internal_deterministic, vmware_ft |
multicore_nondeterministic | |
content | multicore processors can't be used because the way instructions are interleaved makes them non-deterministic |
children | multicore_parallelism |
parents | unicore_processor |
flashcard (front) | Why can't multicore processors be used in the VMWare FT Replication scheme? |
flashcard (back) | The way multicore processors interleave instructions makes them non-deterministic and therefore unsuitable for the VMware FT replication scheme. |
level_of_replication | |
content | What level of replication should be used? |
children | full_state_detailed |
parents | designing_rsm |
designing_rsm | |
content | Designing a Replicated State Machine (RSM) |
children | how_close_is_sync, level_of_replication, new_replica_expensive |
parents | replicated_state_machine |
flashcard (front) | What does RSM stand for? |
flashcard (back) | Replicated State Machine. |
how_close_is_sync | |
content | How close is synchronization? (between primary/backup) |
children | sync_ideal |
parents | designing_rsm |
sync_ideal | |
content | Ideal Synchronization: if primary fails, switch over to backup with no anomalies. |
parents | how_close_is_sync |
remarks | this never actually happens in practice, anomalies do occur |
new_replica_expensive | |
content | Creation of a new replica is expensive |
children | full_state_detailed |
parents | designing_rsm |
full_state_detailed | |
content | Copying full State of machine (registers, memory) is very detailed |
children | application_level (more efficient than machine-level replication) |
parents | level_of_replication, new_replica_expensive, vmware_ft |
application_level | |
content | Most replication schemes are application-level |
children | replication_application |
parents | full_state_detailed |
replication_application | |
content | Replication needs to be a part of the application in order to work. |
children | existing_software (Existing software runs on top of machine and can work,without modification or any knowledge of replication.) |
parents | application_level |
existing_software | |
content | Existing software will work as-is using machine-level replication. |
parents | replication_application |
multicore_parallelism | |
content | Multicore Parallelism is not covered |
parents | multicore_nondeterministic, nondeterministic_events |
nondeterministic_events | |
content | Examples of non-deterministic events |
children | inputs, multicore_parallelism |
parents | just_send_external |
inputs | |
content | Inputs are the most common non-deterministic event |
children | network_packets |
parents | nondeterministic_events |
network_packets | |
content | Inputs in this scope are just network packets |
children | data_interrupt |
parents | inputs |
data_interrupt | |
content | When a packet arrives, the data in the packet, and the interrupt type is stored. |
children | timing_interrupt |
parents | network_packets |
timing_interrupt | |
content | The timing of the interrupt (where it is in the instruction set) must be identical. |
parents | data_interrupt |
vmm | |
content | Virtual Machine Monitor |
children | packet_sends_vm_backup |
parents | vmware_ft |
packet_sends_vm_backup | |
content | Network packets, sends to the VM, then sends a version of the packet to the backup |
children | primary_outputs_only |
parents | vmm |
primary_outputs_only | |
content | Both primary and backup see inputs, primary is the only one that outputs. |
parents | packet_sends_vm_backup |
logging_channel | |
content | Logging Channel: stream of events. |
children | log_entry_format, only_weird_instructions, arriving_packets |
remarks | Context: sending "Log events on the log channel" |
primary_fails | |
content | What if the primary fails? |
children | backup_stops_logs |
parents | vmware_ft |
backup_stops_logs | |
content | Indicator that primary fails is if the backup stops getting logs from the primary. |
children | backup_goes_live |
parents | primary_fails |
remarks | Apparently logs get sent quite frequently to the backup (many times a second). Some kind of "heartbeat" or timing interrupt? I forget the exact terminology |
backup_goes_live | |
content | The Backup Goes "Live" |
children | vm_allows_backup_to_run |
parents | backup_stops_logs |
vm_allows_backup_to_run | |
content | The VM allows the backup to run. The backup then stops discarding output. |
parents | backup_goes_live |
only_weird_instructions | |
content | Only "weird" instructions get sent to the log channel |
parents | logging_channel |
log_entry_format | |
content | Format of a log entry |
children | interrupt_type, log_entry_data |
parents | logging_channel |
remarks | They don't explicitely say what the format of a log entry is in the paper. |
interrupt_type | |
content | Interrupt Type |
parents | log_entry_format |
remarks | I just wrote "type", but I'm assuming it's interrupt type |
log_entry_data | |
content | Data (from network packet) |
parents | log_entry_format |
timer_exact | |
content | Assumes VM has timer in exactly the same place for both the Primary and Backup |
children | physical_timer_to_guest, backup_gets_ahead |
parents | vmware_ft |
physical_timer_to_guest | |
content | Physical timer interrupts are sent to guest |
parents | timer_exact |
arriving_packets | |
content | Arriving Packets |
children | NICS_DMA |
parents | logging_channel |
NICS_DMA | |
content | Some NICS use DMA (direct memory access) in their implementation. |
children | primary_no_DMA |
parents | arriving_packets |
primary_no_DMA | |
content | Primary cannot directly access NIC and the DMA directly |
children | private_mem |
parents | NICS_DMA |
private_mem | |
content | Events from NIC are DMA'd into private memory in VM, then they are copied over to the primary |
children | bounce_buffer ("Bounce Buffer" is the term for what this does) |
parents | primary_no_DMA |
bounce_buffer | |
content | Bounce Buffer |
parents | private_mem |
backup_gets_ahead | |
content | What if backup gets ahead of primary execution? This can't ever happen. |
children | event_buffer_nonempty (Event buffer is used to prevent backup from getting ahead) |
parents | timer_exact |
event_buffer_nonempty | |
content | Event buffer: VM only executes instructions if non-empty |
parents | backup_gets_ahead |
output | |
content | Handling output events |
children | network_packets_only, awkward_failures |
parents | vmware_ft |
network_packets_only | |
content | In this context, the only thing being output are network packets |
parents | output |
awkward_failures | |
content | What are the kinds of awkward failures that could happen? |
children | network_split_brain (example of failure), output_rules, test_and_set (Prevantative Solution) |
parents | output |
output_rules | |
content | Output Rules Preventative Measures against certain kinds of failures |
children | output_waits_for_backup (This prevents issues related to backup not receiving,network packets over log channel) |
parents | awkward_failures |
output_waits_for_backup | |
content | Output can't produce any output until backup receives all previous events to this point in time. |
parents | output_rules |
test_and_set | |
content | Test And Set: an outside authority that deices which machine (primary/backup) can be "live" |
children | acts_like_lock, network_split_brain ("Test and Set" server used to solve this) |
parents | awkward_failures |
network_split_brain | |
content | Network Issues can cause split brain |
parents | awkward_failures, test_and_set |
acts_like_lock | |
content | Test/Set server acts like a lock. The primary/secondary send requests to this server to get write permission, which in turn set a flag on the Test/Set server. |
parents | test_and_set |