glossary
dz / DDIA / glossaryNode Tree
- binary_schema
- bounded
- consensus
- consistency
- consistent_prefix_reads
- data_model
- deterministic
- disconnected_operation
- document_partitioned_indexes
- durable
- evolvability
- full_text_search
- functional_requirements
- hash_partitioning
- high_availability
- join
- key_range_partitioning
- latency
- leaderless_replication
- maintainability
- monotonic_reads
- multi_leader_replication
- MVCC
- nonfunctional_requirements
- normalized
- nosql
- partial_failure
- partitioning
- query_language
- read_after_write_consistency
- read_committed
- relational_database
- reliability
- replication_lag
- REST
- RPC
- scalability
- secondary_index
- sequence_similarity_search
- serializable_isolation
- serializable_snapshot_isolation
- shared_register
- single_leader_replication
- snapshot_isolation
- storage_engine
- term_partitioned_indexes
- transaction
- two_phase_locking
- wide_column_store
Nodes
functional_requirements | |
content | Functional requirements: what it should do, such as allowing data to be sored, retrieved, searched, and processed in various ways. |
children | system_design_interview/glossary/functional_requirement |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable |
location | knowledge/ddia.dz:287 |
nonfunctional_requirements | |
content | nonfunctional requirements: general properties like security, reliablity, compliance, scalability, compatability, maintainability |
children | system_design_interview/glossary/non_functional_requirement |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable |
location | knowledge/ddia.dz:294 |
reliability | |
content | Reliability: making systems work correctly, even when faults occur. |
children | fault_tolerant |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable |
location | knowledge/ddia.dz:301 |
scalability | |
content | Scalability: having strategies for keeping performance good, even when load increases. |
parents | DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable, DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication |
location | knowledge/ddia.dz:307 |
maintainability | |
content | Maintainability: making life better for engineering and operations teams who need to work with the system. |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable |
location | knowledge/ddia.dz:313 |
relational_database | |
content | Relational Database: invented to solve "many-to-many" problem |
children | system_design_interview/glossary/RDBMS |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages |
location | knowledge/ddia.dz:319 |
nosql | |
content | NoSQL Datastores |
children | graph_database, document_database |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages |
location | knowledge/ddia.dz:324 |
document_database | |
content | Document Database: targets use cases where data comes in self contained documents and relationships between one document and another are rare. |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages, nosql |
location | knowledge/ddia.dz:329 |
graph_database | |
content | Graph Database: useful for cases where data where anything is potentially related to everything |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages, nosql |
location | knowledge/ddia.dz:337 |
data_model | |
content | Data Model |
children | system_design_interview/glossary/data_model |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages |
location | knowledge/ddia.dz:344 |
query_language | |
content | Query Language |
children | DDIA/tools/XSL_XPath (not a DB query language, but intersting parallel), DDIA/tools/datalog, DDIA/tools/CSS (not a DB query language, but intersting parallel), DDIA/tools/SQL, DDIA/tools/MapReduce, DDIA/tools/SPARQL, DDIA/tools/cypher, DDIA/tools/monogdb_aggregration_pipeline |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages |
location | knowledge/ddia.dz:349 |
sequence_similarity_search | |
content | Sequence Similarity Search: taking one long string (such as a DNA molecule), and matching it against a large database of strings that are similar, but not identical |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages |
location | knowledge/ddia.dz:402 |
full_text_search | |
content | Full Text Search: arguably a kind of data model used alongside databases. |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages |
location | knowledge/ddia.dz:409 |
OLTP | |
content | OLTP: Online Transaction Processing, optimized for transaction processing. |
children | links/bloom_filters_sqlite (SQLite is a general-purpose DB, but excels at OLTP workloads), log_structured, update_in_place |
parents | DDIA/glossary, storage_engine, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
flashcard (front) | What is an OLTP databse? |
flashcard (back) | Online transaction processing, optimized for transaction processing |
location | knowledge/ddia.dz:415 |
OLAP | |
content | OLAP: Online analytical processing, optimized for analytical processing |
children | data_warehouse, links/bloom_filters_sqlite (Researchers at buffalo university in 2015 found that,most queries are simple KV lookups and OLAP queries), column_oriented_storage |
parents | DDIA/glossary, storage_engine, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
flashcard (front) | What is an OLAP database? |
flashcard (back) | Online analytical processing. |
location | knowledge/ddia.dz:425 |
storage_engine | |
content | Storage Engine |
children | OLTP, OLAP |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
location | knowledge/ddia.dz:434 |
data_warehouse | |
content | Data Warehouse |
children | star_schema |
parents | DDIA/glossary, OLAP, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
location | knowledge/ddia.dz:439 |
log_structured | |
content | Log-structured storage engine: only permits appending to files and deleting obsolete files, but never updates a file that has been written. |
children | SSTable, DDIA/tools/cassandra, DDIA/tools/bitcask, LSM_tree, DDIA/tools/levelDB, DDIA/tools/lucene, DDIA/tools/hbase |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, OLTP |
location | knowledge/ddia.dz:445 |
SSTable | |
content | SSTable: Sorted String Table |
parents | DDIA/glossary, log_structured, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
location | knowledge/ddia.dz:459 |
LSM_tree | |
content | LSM_tree: Log-Structured Merge Tree |
parents | DDIA/glossary, log_structured, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
location | knowledge/ddia.dz:465 |
update_in_place | |
content | Update-in-place storage engine: treats disk as set of fixed-size pages that can be overwritten |
children | btree |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, OLTP |
location | knowledge/ddia.dz:495 |
btree | |
content | B-Tree |
parents | DDIA/glossary, update_in_place, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
location | knowledge/ddia.dz:502 |
column_oriented_storage | |
content | column oriented storage: aims to encode data very compactly, and minimize amount of data query needs to read from disk |
parents | DDIA/glossary, OLAP, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
location | knowledge/ddia.dz:508 |
rolling_upgrade | |
content | Rolling Upgrade: a new version of a service is gradually deployed to a few nodes at a time, rather than deploying to all nodes simultaneously. |
parents | DDIA/glossary, evolvability, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution |
location | knowledge/ddia.dz:516 |
evolvability | |
content | Evolvability: the ease of making changes in an application |
children | rolling_upgrade |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution |
location | knowledge/ddia.dz:523 |
binary_schema | |
content | binary schema driven formats |
children | DDIA/tools/thrift, DDIA/tools/avro, DDIA/tools/protocol_buffers |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution |
remarks | useful for documentation and code generation, but data needs to be decoded before it is human readable |
location | knowledge/ddia.dz:544 |
REST | |
content | REST API |
children | system_design_interview/glossary/REST |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution |
location | knowledge/ddia.dz:569 |
RPC | |
content | RPC API |
parents | DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution |
location | knowledge/ddia.dz:574 |
high_availability | |
content | High Availability: keeping the system running, even one when machine (or several machines) goes down |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication |
location | knowledge/ddia.dz:579 |
disconnected_operation | |
content | Disconnected Operation: Allowing an application to continue working when there is a network interruption |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication |
location | knowledge/ddia.dz:587 |
latency | |
content | Latency |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication |
location | knowledge/ddia.dz:595 |
single_leader_replication | |
content | Single-leader replication: Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/approaches_to_replication |
location | knowledge/ddia.dz:605 |
multi_leader_replication | |
content | Multi-leader replication: clients send each write to one of several leader nodes, any of which can accept writes. The leaders send streams of data change events to each other and to any follower nodes. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/approaches_to_replication |
location | knowledge/ddia.dz:614 |
leaderless_replication | |
content | Leaderless Replication: clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data. |
location | knowledge/ddia.dz:622 |
replication_lag | |
content | Replication Lag: the delay between a write happening on the leader and being reflected on the follower |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models |
location | knowledge/ddia.dz:627 |
read_after_write_consistency | |
content | Read-after-write consistency: users should always see data that they submitted themselves. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models |
flashcard (front) | What is "Read After Write Consistency"? |
flashcard (back) | In replication, read-after-write consistency is the idea that users should always see data that they submitted themselves. |
location | knowledge/ddia.dz:635 |
monotonic_reads | |
content | Monotonic Reads: after users have seen the data at one point in time, they shouldn't later see the data from some earlier point in time |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models |
flashcard (front) | What are "Monotonic Reads"? |
flashcard (back) | In replication, monotonic reads are the idea that if a user sees data from some point in time, they shouldn't later see the data from an earlier point in time (time monotonically increasing) |
location | knowledge/ddia.dz:645 |
consistent_prefix_reads | |
content | Consistent prefix reads: users should see data in a state that makes causal sense: for example, seeing a question and its reply in the correct order. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models |
flashcard (front) | What are "consistent prefix reads"? |
flashcard (back) | In replication consistency models, consistent prefix reads state that users should see data in a state that makes causal sense (ex: question then answer). |
location | knowledge/ddia.dz:657 |
partitioning | |
content | Partitioning: splitting up a large dataset or computation that is too big for a single machine into smaller parts and spreading them across several machines. |
children | DDIA/toc/3_derived_data/10_batch_processing/problems_solved/partitioning, sharding (AKA), vbucket (AKA), system_design_interview/glossary/partition, tablet_bigtable (AKA), region_hbase (AKA), vnode (AKA) |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning |
flashcard (front) | What is partitioning? |
flashcard (back) | Partitioning is the act of splitting up a large dataset or computation for a single machine into smaller parts and spreading them. |
location | knowledge/ddia.dz:668 |
sharding | |
content | Sharding |
parents | DDIA/glossary, partitioning |
location | knowledge/ddia.dz:680 |
region_hbase | |
content | Region: a term for a partition in HBase |
parents | DDIA/glossary, DDIA/tools/hbase, partitioning |
location | knowledge/ddia.dz:686 |
tablet_bigtable | |
content | tablet: in Bigtable, a name for a partition |
parents | DDIA/tools/bigtable, partitioning |
location | knowledge/ddia.dz:694 |
vnode | |
content | vnode: a term for "partition" in cassandra and riak |
parents | DDIA/glossary, DDIA/tools/riak, DDIA/tools/cassandra, partitioning |
location | knowledge/ddia.dz:700 |
wide_column_store | |
content | wide-column store: A wide-column store is a type of NoSQL database that uses tables, rows, and columns but allows column names and formats to vary. It can be considered a two-dimensional key-value store. Google's Bigtable is a classic example of a wide-column store. |
parents | DDIA/glossary, DDIA/tools/bigtable |
location | knowledge/ddia.dz:716 |
vbucket | |
content | vbucket: term for a partition in Couchbase |
parents | DDIA/tools/couchbase, DDIA/glossary, partitioning |
location | knowledge/ddia.dz:739 |
key_range_partitioning | |
content | Key-range partitioning: involves sorting keys, with a partition owning all keys between a minimum and maximum value. This method enables efficient range queries, but it can lead to hotspots if the application frequently accesses keys near each other in the sorted order. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/main_approaches |
flashcard (front) | What is key-range partitioning? |
flashcard (back) | Key-range partitioning involves sorting keys, with a partition owning all keys between a minimum and maximum value. This method enables efficient range queries, but it can lead to hotspots if the application frequently accesses keys near each other |
location | knowledge/ddia.dz:758 |
hash_partitioning | |
content | Hash Partitioning: involves applying a hash function to each key, resulting in a partition owning a range of hashes. While this method can destroy the ordering of keys and make range queries inefficient, it can also help distribute load more evenly across the partitions. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/main_approaches |
flashcard (front) | What is hash partitioning? |
flashcard (back) | Hash Partitioning: involves applying a hash function to each key, resulting in a partition owning a range of hashes. |
location | knowledge/ddia.dz:774 |
document_partitioned_indexes | |
content | Document Partitioned Indexes (local indexes): involve storing secondary indexes in the same partition as the primary key and value. This approach reduces reduces updates to a single partition on write, but a read of the secondary index requires a scatter/gather across all partitions, increasing the overall operation's complexity. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes |
flashcard (front) | What are document partitioned indexes? |
flashcard (back) | Document Partitioned Indexes (local indexes) store secondary indexes in the same partition as the primary key and value, reducing the need to update partitions on write. However, reads of secondary indexes require a scatter/gather across all partitions. |
location | knowledge/ddia.dz:787 |
term_partitioned_indexes | |
content | Term-partioned indexes: store secondary indexes separately, using the indexed value. An entry in the secondary index may include records from all partitions of the primary key. When a document is written, several partitions of the secondary index need to be updated, but a read can be served from a single partition. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes |
flashcard (front) | What is a term-partitioned index? |
flashcard (back) | Term-partioned indexes store secondary indexes in separate partitions, using the indexed value. An entry in the secondary index may include records from multiple partitions of the primary key, and updates to a document may require updating several partitions, while reads can be served from a single partition. |
location | knowledge/ddia.dz:804 |
secondary_index | |
content | Secondary Index: A database index created on one or more columns that are not the primary key, designed to improve query performance by enabling faster data retrieval on non-primary key fields. Unlike the primary index, multiple secondary indexes can exist on a single table, allowing quick searches on various columns at the cost of slight overhead during data modifications. |
children | DDIA/tools/sqlite/create_index |
parents | DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes |
location | knowledge/ddia.dz:822 |
transaction | |
content | Transaction: Grouping together several reads and writes into a logical unit, in order to simplify error handling and concurrency issues. |
children | transaction_abort, dirty_writes (Almost all transaction types prevent dirty writes.) |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions |
flashcard (front) | What is a transaction? |
flashcard (back) | A transaction groups together several reads and writes into a logical unit, in order to simplify error handling and concurrency issues. |
location | knowledge/ddia.dz:850 |
transaction_abort | |
content | Transaction abort: occurs when a database transaction is interrupted or terminated prematurely. A large class of errors related to software and hardware can be reduced town to transaction aborts. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions, transaction |
location | knowledge/ddia.dz:861 |
read_committed | |
content | Read Committed |
children | dirty_reads (The read-committed isolation level and stronger,levels prevent dirty reads, ensuring that data is,consistent and accurate.) |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels |
location | knowledge/ddia.dz:870 |
snapshot_isolation | |
content | Snapshot isolation |
children | lost_updates (Some implemenations of snapshot isolation prevent,this anomaly automatically, while others require,a manual lock.), read_skew (Most commonly prevented with snapshot isolation, which,allows a transaction to read froma consistent snapshot,at one point in time.), repeatable_read (AKA) |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels |
location | knowledge/ddia.dz:876 |
repeatable_read | |
content | Repeatable Read |
parents | DDIA/glossary, snapshot_isolation |
location | knowledge/ddia.dz:882 |
serializable_isolation | |
content | Serializable Isolation |
children | phantom_reads (Serializable isolation prevents only,straightforward phantom reads), write_skew (Only serializable isolation prevents write skew) |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels |
location | knowledge/ddia.dz:888 |
dirty_reads | |
content | Dirty Reads: One client reads another client's uncommitted writes before they have been committed. |
parents | DDIA/glossary, read_committed, DDIA/toc/2_distributed_data/07_transactions/race_conditions |
flashcard (front) | What are dirty reads? |
flashcard (back) | Dirty reads are a race condition that occurs when one client reads another client's uncommitted writes before they have been committed. |
location | knowledge/ddia.dz:893 |
dirty_writes | |
content | Dirty Writes: one client overwrites data that another client has written, but not yet committed. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/race_conditions, transaction |
flashcard (front) | What are dirty writes? |
flashcard (back) | One client overwrites data that another client has written, but not yet comitted. |
location | knowledge/ddia.dz:907 |
read_skew | |
content | Read Skew: a client sees different parts of the database at different points in time. |
children | non_repeatable_reads (AKA) |
parents | DDIA/toc/2_distributed_data/07_transactions/race_conditions, DDIA/glossary, MVCC, snapshot_isolation |
location | knowledge/ddia.dz:918 |
MVCC | |
content | MVCC: Multi-version concurrency control |
children | read_skew (snapshot isolation for fixing read skew is usually,implemented using multi-version concurrency control) |
parents | DDIA/glossary |
location | knowledge/ddia.dz:931 |
non_repeatable_reads | |
content | non-repeatable reads |
parents | read_skew |
location | knowledge/ddia.dz:935 |
lost_updates | |
content | Lost Update: two clients concurrenctly perform a read-modify-write-cycle. One overwrites the other's write without incorporating its changes, so data is lost. |
parents | DDIA/toc/2_distributed_data/07_transactions/race_conditions, DDIA/glossary, snapshot_isolation |
location | knowledge/ddia.dz:940 |
write_skew | |
content | Write Skew: A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true. |
children | phantom_reads (phantoms in the context of write skew require,special treatment, such as index range locks) |
parents | DDIA/toc/2_distributed_data/07_transactions/race_conditions, serializable_isolation, DDIA/glossary |
location | knowledge/ddia.dz:952 |
phantom_reads | |
content | A transaction reads objects that match some search condition. Another client makes a write that affects the results of that search. |
children | index_range_lock (Phantoms in the context of write skew require,special treatment, such as index-range locks) |
parents | write_skew, serializable_isolation |
location | knowledge/ddia.dz:963 |
index_range_lock | |
content | Index Range Lock |
parents | phantom_reads, DDIA/glossary |
location | knowledge/ddia.dz:974 |
two_phase_locking | |
content | Two-phase locking |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/serializable_transactions |
location | knowledge/ddia.dz:982 |
serializable_snapshot_isolation | |
content | Serializable Snapshot Isolation (SSI): a relatively new algorithm that avoids many of the drawbacks of previous approaches. It uses an optimistic approach, allowing transactions to proceed without blocking. When a transaction attempts to commit, it is checked, and if it is not serializable, it is aborted. |
parents | DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/serializable_transactions |
location | knowledge/ddia.dz:991 |
partial_failure | |
content | Partial Failure: in a distributed system, parts of a system that are broken in some unpredictable way. |
children | nondeterministic (non-determinism in distributed systems is what makes,distributed systems difficult), DDIA/toc/2_distributed_data/08_trouble_distributed_systems/partial_failures |
parents | DDIA/glossary |
location | knowledge/ddia.dz:1001 |
deterministic | |
content | Deterministic: Describing a function that always produces the same output if you give it the the same input. This means it cannot depend on random numbers, the time of day, network communication, or other unpredictable things. |
parents | DDIA/glossary |
location | knowledge/ddia.dz:1006 |
nondeterministic | |
content | Nondetermistic: describing a function that produces unpredictable output on the same input. |
parents | partial_failure |
location | knowledge/ddia.dz:1014 |
consistency | |
content | Consistency |
children | causal_consistency, linearizability (A popular consistency model), DDIA/toc/2_distributed_data/09_consistency_consensus |
parents | DDIA/glossary |
location | knowledge/ddia.dz:1022 |
consensus | |
content | Consensus: a fundamental problem in distributed computing, concerning getting several nodes to agree on something (for example, which node should be the leader for a database cluster). |
children | DDIA/toc/2_distributed_data/09_consistency_consensus |
parents | DDIA/glossary |
flashcard (front) | What is consensus? |
flashcard (back) | In distributed computing, consensus is the problem of getting nodes to agree on something. |
location | knowledge/ddia.dz:1026 |
linearizability | |
content | Linearizability: behaving as if there was only a single copy of data in the system, which is updated by atomic operations. |
children | causal_consistency (CC does not have the coordination overhead of,linearizability, and is less sensitive to network,problems. It is a weaker consitency model to,linearizability), DDIA/toc/2_distributed_data/09_consistency_consensus |
parents | consistency |
flashcard (front) | What is "linearizability"? |
flashcard (back) | Linearizabilty is a consistency model which behaves as if there was only a single copy of data in the system, updated by atomic operations. |
location | knowledge/ddia.dz:1036 |
causal_consistency | |
content | Causal Consistency: a consistency model that allows for things to be concurrent. This causes the timeline to contain branching and merging. |
children | DDIA/toc/2_distributed_data/09_consistency_consensus |
parents | linearizability, consistency |
location | knowledge/ddia.dz:1047 |
shared_register | |
content | Shared Register: fundamental type of shared data structure in distributed systems with two operations: a read operation, and write operation. This is used to build shared-memory and message-passing systems. |
children | DDIA/toc/2_distributed_data/09_consistency_consensus/equivalent_consensus_problems/linearizable_compare_and_set_registers, compare_and_swap (linearizable compare-and-swap register) |
parents | DDIA/glossary |
hyperlink | https://en.wikipedia.org/wiki/Shared_register |
flashcard (front) | What is a shared register? |
flashcard (back) | A fundamental shared data structure in distributed systems that allows a read/write operation. |
location | knowledge/ddia.dz:1059 |
compare_and_swap | |
content | Compare_and_swap: An atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. |
children | DDIA/toc/2_distributed_data/09_consistency_consensus/equivalent_consensus_problems/linearizable_compare_and_set_registers |
parents | shared_register |
hyperlink | https://en.wikipedia.org/wiki/Compare-and-swap |
flashcard (front) | What is compare-and-swap? |
flashcard (back) | An atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a value, and will only modify the conents of that memory location if the value matches. |
location | knowledge/ddia.dz:1070 |
systems_of_record | |
content | Systems Of Record: AKA source of truth, holds the authoritative version of your data. Each fact is represented exactly once. |
children | fact_table (fact table is a source of truth, and,therefore a system of record (I think?)), derived_data_system (vs), DDIA/toc/2_distributed_data/09_consistency_consensus/if_leader_fails/automatically_choose_new_leader |
parents | normalized, DDIA/glossary |
flashcard (front) | What is are "systems of record"? |
flashcard (back) | Systems of record are systems where there is one source of truth. |
location | knowledge/ddia.dz:1106 |
derived_data_system | |
content | derived data system: system that takes some data from another system and transforms it in some way |
children | denormalize (derived data systems are typically denormalized), DDIA/toc/2_distributed_data/09_consistency_consensus/if_leader_fails/automatically_choose_new_leader |
parents | systems_of_record, DDIA/glossary |
flashcard (front) | What is a "derived data system"? |
flashcard (back) | A derived data system is one that takes some data from another system and transforms it in some way. |
location | knowledge/ddia.dz:1115 |
normalized | |
content | normalized: structured in such a way that there is no redundancy or duplication. In a normalized database, when some piece of data changes, you only need to change it in once place, not many copies in many different places. |
children | denormalize (introduces some amount of redundancy in a normalized,dataset.), DDIA/references/database_normalization, systems_of_record (systems of record represent each fact exactly once,,meaning the data is usually normalized) |
flashcard (front) | What does "normalized" mean in the context of a database? |
flashcard (back) | "Normalized" refers to data structured in a way that there is no duplication. |
location | knowledge/ddia.dz:1125 |
denormalize | |
content | denormalize: to introduce some amount of redundancy or duplication in a normalized dataset, typically in the form of a cache or index, in order to speed up reads. A denormalized value is a kind of precomputed query result, similar to a materialized view. |
children | materialize (a denormalized value is similar to a materialized view) |
parents | DDIA/glossary, derived_data_system, normalized |
flashcard (front) | What does it mean to "denormalize" in the context of data? |
flashcard (back) | To denormalize a dataset is to introduce some amount of redundancy or duplication in a normalized dataset. |
location | knowledge/ddia.dz:1147 |
fault_tolerant | |
content | Fault Tolerant: able to recover automatically if something goes wrong, such as a crash or network failure. |
children | DDIA/toc/3_derived_data/10_batch_processing/problems_solved/fault_tolerance |
parents | DDIA/glossary, reliability |
flashcard (front) | what does it mean to be "fault tolerant"? |
flashcard (back) | A system that is fault tolerant is able to recover automatically if something goes wrong. |
location | knowledge/ddia.dz:1180 |
materialize | |
content | Materialize: to perform a computation eagerly, and write out its result, as opposed to calculating it on demand when requested. |
children | DDIA/references/materialized_view |
parents | DDIA/glossary, denormalize |
flashcard (front) | What does it mean to "materialize" something? |
flashcard (back) | To materialize is to perform a computation in advance and write the results. Materialization is the process used to build a materialized view. |
location | knowledge/ddia.dz:1190 |
join | |
content | Join: the process of combining rows of tables based on a common link or attribute between them. It is used to retrieve specific records that have a connection, such as a foreign key reference, and is commonly used in queries that need to retrieve related data. |
children | links/bloom_filters_sqlite (some discussion on joins), DDIA/toc/3_derived_data/mapreduce_join_algos |
parents | DDIA/glossary |
flashcard (front) | What is a join? |
flashcard (back) | A join operation brings together records that have something in common. |
location | knowledge/ddia.dz:1208 |
bounded | |
content | Bounded: having some known upper limit or size. |
children | DDIA/toc/3_derived_data/10_batch_processing (input data to a distributed batch processing job is,bounded) |
parents | DDIA/glossary |
location | knowledge/ddia.dz:1220 |
durable | |
content | Durable: storing data in such a way that you believe it will not be lost, even if various faults occur. |
children | DDIA/toc/2_distributed_data/07_transactions/meaning_of_ACID/durability, python/docs/stdlib/persistence (persistant on-disk state is used in the context of durability) |
parents | DDIA/glossary |
location | knowledge/ddia.dz:1224 |
star_schema | |
content | star schema: typical formulaic style for how data warehouses are used |
children | DDIA/references/star_schema, fact_table, dimension_table |
parents | data_warehouse, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval |
location | knowledge/ddia.dz:1230 |
fact_table | |
content | fact table: records measurements or metrics for a specific event |
children | DDIA/references/star_schema/fact_tables, dimension_table |
parents | databases/star_schema_benchmark/star_queries, systems_of_record, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, DDIA/glossary, star_schema |
location | knowledge/ddia.dz:1235 |
dimension_table | |
content | Dimension Table: Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a large number of attributes to describe the fact data |
children | DDIA/references/star_schema/dimension_tables |
parents | DDIA/glossary, fact_table, databases/star_schema_benchmark/star_queries, star_schema |