DDIA/glossary

glossary

dz / DDIA / glossary

Node Tree

Nodes

functional_requirements
content Functional requirements: what it should do, such as allowing data to be sored, retrieved, searched, and processed in various ways.
children system_design_interview/glossary/functional_requirement
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location knowledge/ddia.dz:287

nonfunctional_requirements
content nonfunctional requirements: general properties like security, reliablity, compliance, scalability, compatability, maintainability
children system_design_interview/glossary/non_functional_requirement
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location knowledge/ddia.dz:294

reliability
content Reliability: making systems work correctly, even when faults occur.
children fault_tolerant
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location knowledge/ddia.dz:301

scalability
content Scalability: having strategies for keeping performance good, even when load increases.
parents DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable, DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location knowledge/ddia.dz:307

maintainability
content Maintainability: making life better for engineering and operations teams who need to work with the system.
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location knowledge/ddia.dz:313

relational_database
content Relational Database: invented to solve "many-to-many" problem
children system_design_interview/glossary/RDBMS
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location knowledge/ddia.dz:319

nosql
content NoSQL Datastores
children graph_database, document_database
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location knowledge/ddia.dz:324

document_database
content Document Database: targets use cases where data comes in self contained documents and relationships between one document and another are rare.
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages, nosql
location knowledge/ddia.dz:329

graph_database
content Graph Database: useful for cases where data where anything is potentially related to everything
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages, nosql
location knowledge/ddia.dz:337

data_model
content Data Model
children system_design_interview/glossary/data_model
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location knowledge/ddia.dz:344

query_language
content Query Language
children DDIA/tools/XSL_XPath (not a DB query language, but intersting parallel), DDIA/tools/datalog, DDIA/tools/CSS (not a DB query language, but intersting parallel), DDIA/tools/SQL, DDIA/tools/MapReduce, DDIA/tools/SPARQL, DDIA/tools/cypher, DDIA/tools/monogdb_aggregration_pipeline
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location knowledge/ddia.dz:349

sequence_similarity_search
content Sequence Similarity Search: taking one long string (such as a DNA molecule), and matching it against a large database of strings that are similar, but not identical
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location knowledge/ddia.dz:402

full_text_search
content Full Text Search: arguably a kind of data model used alongside databases.
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location knowledge/ddia.dz:409

OLTP
content OLTP: Online Transaction Processing, optimized for transaction processing.
children links/bloom_filters_sqlite (SQLite is a general-purpose DB, but excels at OLTP workloads), log_structured, update_in_place
parents DDIA/glossary, storage_engine, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
flashcard (front) What is an OLTP databse?
flashcard (back) Online transaction processing, optimized for transaction processing
location knowledge/ddia.dz:415

OLAP
content OLAP: Online analytical processing, optimized for analytical processing
children data_warehouse, links/bloom_filters_sqlite (Researchers at buffalo university in 2015 found that,most queries are simple KV lookups and OLAP queries), column_oriented_storage
parents DDIA/glossary, storage_engine, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
flashcard (front) What is an OLAP database?
flashcard (back) Online analytical processing.
location knowledge/ddia.dz:425

storage_engine
content Storage Engine
children OLTP, OLAP
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location knowledge/ddia.dz:434

data_warehouse
content Data Warehouse
children star_schema
parents DDIA/glossary, OLAP, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location knowledge/ddia.dz:439

log_structured
content Log-structured storage engine: only permits appending to files and deleting obsolete files, but never updates a file that has been written.
children SSTable, DDIA/tools/cassandra, DDIA/tools/bitcask, LSM_tree, DDIA/tools/levelDB, DDIA/tools/lucene, DDIA/tools/hbase
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, OLTP
location knowledge/ddia.dz:445

SSTable
content SSTable: Sorted String Table
parents DDIA/glossary, log_structured, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location knowledge/ddia.dz:459

LSM_tree
content LSM_tree: Log-Structured Merge Tree
parents DDIA/glossary, log_structured, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location knowledge/ddia.dz:465

update_in_place
content Update-in-place storage engine: treats disk as set of fixed-size pages that can be overwritten
children btree
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, OLTP
location knowledge/ddia.dz:495

btree
content B-Tree
parents DDIA/glossary, update_in_place, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location knowledge/ddia.dz:502

column_oriented_storage
content column oriented storage: aims to encode data very compactly, and minimize amount of data query needs to read from disk
parents DDIA/glossary, OLAP, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location knowledge/ddia.dz:508

rolling_upgrade
content Rolling Upgrade: a new version of a service is gradually deployed to a few nodes at a time, rather than deploying to all nodes simultaneously.
parents DDIA/glossary, evolvability, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location knowledge/ddia.dz:516

evolvability
content Evolvability: the ease of making changes in an application
children rolling_upgrade
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location knowledge/ddia.dz:523

binary_schema
content binary schema driven formats
children DDIA/tools/thrift, DDIA/tools/avro, DDIA/tools/protocol_buffers
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
remarks useful for documentation and code generation, but data needs to be decoded before it is human readable
location knowledge/ddia.dz:544

REST
content REST API
children system_design_interview/glossary/REST
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location knowledge/ddia.dz:569

RPC
content RPC API
parents DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location knowledge/ddia.dz:574

high_availability
content High Availability: keeping the system running, even one when machine (or several machines) goes down
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location knowledge/ddia.dz:579

disconnected_operation
content Disconnected Operation: Allowing an application to continue working when there is a network interruption
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location knowledge/ddia.dz:587

latency
content Latency
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location knowledge/ddia.dz:595

single_leader_replication
content Single-leader replication: Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale.
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/approaches_to_replication
location knowledge/ddia.dz:605

multi_leader_replication
content Multi-leader replication: clients send each write to one of several leader nodes, any of which can accept writes. The leaders send streams of data change events to each other and to any follower nodes.
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/approaches_to_replication
location knowledge/ddia.dz:614

leaderless_replication
content Leaderless Replication: clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data.
location knowledge/ddia.dz:622

replication_lag
content Replication Lag: the delay between a write happening on the leader and being reflected on the follower
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
location knowledge/ddia.dz:627

read_after_write_consistency
content Read-after-write consistency: users should always see data that they submitted themselves.
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
flashcard (front) What is "Read After Write Consistency"?
flashcard (back) In replication, read-after-write consistency is the idea that users should always see data that they submitted themselves.
location knowledge/ddia.dz:635

monotonic_reads
content Monotonic Reads: after users have seen the data at one point in time, they shouldn't later see the data from some earlier point in time
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
flashcard (front) What are "Monotonic Reads"?
flashcard (back) In replication, monotonic reads are the idea that if a user sees data from some point in time, they shouldn't later see the data from an earlier point in time (time monotonically increasing)
location knowledge/ddia.dz:645

consistent_prefix_reads
content Consistent prefix reads: users should see data in a state that makes causal sense: for example, seeing a question and its reply in the correct order.
parents DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
flashcard (front) What are "consistent prefix reads"?
flashcard (back) In replication consistency models, consistent prefix reads state that users should see data in a state that makes causal sense (ex: question then answer).
location knowledge/ddia.dz:657

partitioning
content Partitioning: splitting up a large dataset or computation that is too big for a single machine into smaller parts and spreading them across several machines.
children DDIA/toc/3_derived_data/10_batch_processing/problems_solved/partitioning, sharding (AKA), vbucket (AKA), system_design_interview/glossary/partition, tablet_bigtable (AKA), region_hbase (AKA), vnode (AKA)
parents DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning
flashcard (front) What is partitioning?
flashcard (back) Partitioning is the act of splitting up a large dataset or computation for a single machine into smaller parts and spreading them.
location knowledge/ddia.dz:668

sharding
content Sharding
parents DDIA/glossary, partitioning
location knowledge/ddia.dz:680

region_hbase
content Region: a term for a partition in HBase
parents DDIA/glossary, DDIA/tools/hbase, partitioning
location knowledge/ddia.dz:686

tablet_bigtable
content tablet: in Bigtable, a name for a partition
parents DDIA/tools/bigtable, partitioning
location knowledge/ddia.dz:694

vnode
content vnode: a term for "partition" in cassandra and riak
parents DDIA/glossary, DDIA/tools/riak, DDIA/tools/cassandra, partitioning
location knowledge/ddia.dz:700

wide_column_store
content wide-column store: A wide-column store is a type of NoSQL database that uses tables, rows, and columns but allows column names and formats to vary. It can be considered a two-dimensional key-value store. Google's Bigtable is a classic example of a wide-column store.
parents DDIA/glossary, DDIA/tools/bigtable
location knowledge/ddia.dz:716

vbucket
content vbucket: term for a partition in Couchbase
parents DDIA/tools/couchbase, DDIA/glossary, partitioning
location knowledge/ddia.dz:739

key_range_partitioning
content Key-range partitioning: involves sorting keys, with a partition owning all keys between a minimum and maximum value. This method enables efficient range queries, but it can lead to hotspots if the application frequently accesses keys near each other in the sorted order.
parents DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/main_approaches
flashcard (front) What is key-range partitioning?
flashcard (back) Key-range partitioning involves sorting keys, with a partition owning all keys between a minimum and maximum value. This method enables efficient range queries, but it can lead to hotspots if the application frequently accesses keys near each other
location knowledge/ddia.dz:758

hash_partitioning
content Hash Partitioning: involves applying a hash function to each key, resulting in a partition owning a range of hashes. While this method can destroy the ordering of keys and make range queries inefficient, it can also help distribute load more evenly across the partitions.
parents DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/main_approaches
flashcard (front) What is hash partitioning?
flashcard (back) Hash Partitioning: involves applying a hash function to each key, resulting in a partition owning a range of hashes.
location knowledge/ddia.dz:774

document_partitioned_indexes
content Document Partitioned Indexes (local indexes): involve storing secondary indexes in the same partition as the primary key and value. This approach reduces reduces updates to a single partition on write, but a read of the secondary index requires a scatter/gather across all partitions, increasing the overall operation's complexity.
parents DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes
flashcard (front) What are document partitioned indexes?
flashcard (back) Document Partitioned Indexes (local indexes) store secondary indexes in the same partition as the primary key and value, reducing the need to update partitions on write. However, reads of secondary indexes require a scatter/gather across all partitions.
location knowledge/ddia.dz:787

term_partitioned_indexes
content Term-partioned indexes: store secondary indexes separately, using the indexed value. An entry in the secondary index may include records from all partitions of the primary key. When a document is written, several partitions of the secondary index need to be updated, but a read can be served from a single partition.
parents DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes
flashcard (front) What is a term-partitioned index?
flashcard (back) Term-partioned indexes store secondary indexes in separate partitions, using the indexed value. An entry in the secondary index may include records from multiple partitions of the primary key, and updates to a document may require updating several partitions, while reads can be served from a single partition.
location knowledge/ddia.dz:804

secondary_index
content Secondary Index: A database index created on one or more columns that are not the primary key, designed to improve query performance by enabling faster data retrieval on non-primary key fields. Unlike the primary index, multiple secondary indexes can exist on a single table, allowing quick searches on various columns at the cost of slight overhead during data modifications.
children DDIA/tools/sqlite/create_index
parents DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes
location knowledge/ddia.dz:822

transaction
content Transaction: Grouping together several reads and writes into a logical unit, in order to simplify error handling and concurrency issues.
children transaction_abort, dirty_writes (Almost all transaction types prevent dirty writes.)
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions
flashcard (front) What is a transaction?
flashcard (back) A transaction groups together several reads and writes into a logical unit, in order to simplify error handling and concurrency issues.
location knowledge/ddia.dz:850

transaction_abort
content Transaction abort: occurs when a database transaction is interrupted or terminated prematurely. A large class of errors related to software and hardware can be reduced town to transaction aborts.
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions, transaction
location knowledge/ddia.dz:861

read_committed
content Read Committed
children dirty_reads (The read-committed isolation level and stronger,levels prevent dirty reads, ensuring that data is,consistent and accurate.)
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels
location knowledge/ddia.dz:870

snapshot_isolation
content Snapshot isolation
children lost_updates (Some implemenations of snapshot isolation prevent,this anomaly automatically, while others require,a manual lock.), read_skew (Most commonly prevented with snapshot isolation, which,allows a transaction to read froma consistent snapshot,at one point in time.), repeatable_read (AKA)
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels
location knowledge/ddia.dz:876

repeatable_read
content Repeatable Read
parents DDIA/glossary, snapshot_isolation
location knowledge/ddia.dz:882

serializable_isolation
content Serializable Isolation
children phantom_reads (Serializable isolation prevents only,straightforward phantom reads), write_skew (Only serializable isolation prevents write skew)
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels
location knowledge/ddia.dz:888

dirty_reads
content Dirty Reads: One client reads another client's uncommitted writes before they have been committed.
parents DDIA/glossary, read_committed, DDIA/toc/2_distributed_data/07_transactions/race_conditions
flashcard (front) What are dirty reads?
flashcard (back) Dirty reads are a race condition that occurs when one client reads another client's uncommitted writes before they have been committed.
location knowledge/ddia.dz:893

dirty_writes
content Dirty Writes: one client overwrites data that another client has written, but not yet committed.
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/race_conditions, transaction
flashcard (front) What are dirty writes?
flashcard (back) One client overwrites data that another client has written, but not yet comitted.
location knowledge/ddia.dz:907

read_skew
content Read Skew: a client sees different parts of the database at different points in time.
children non_repeatable_reads (AKA)
parents DDIA/toc/2_distributed_data/07_transactions/race_conditions, DDIA/glossary, MVCC, snapshot_isolation
location knowledge/ddia.dz:918

MVCC
content MVCC: Multi-version concurrency control
children read_skew (snapshot isolation for fixing read skew is usually,implemented using multi-version concurrency control)
parents DDIA/glossary
location knowledge/ddia.dz:931

non_repeatable_reads
content non-repeatable reads
parents read_skew
location knowledge/ddia.dz:935

lost_updates
content Lost Update: two clients concurrenctly perform a read-modify-write-cycle. One overwrites the other's write without incorporating its changes, so data is lost.
parents DDIA/toc/2_distributed_data/07_transactions/race_conditions, DDIA/glossary, snapshot_isolation
location knowledge/ddia.dz:940

write_skew
content Write Skew: A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true.
children phantom_reads (phantoms in the context of write skew require,special treatment, such as index range locks)
parents DDIA/toc/2_distributed_data/07_transactions/race_conditions, serializable_isolation, DDIA/glossary
location knowledge/ddia.dz:952

phantom_reads
content A transaction reads objects that match some search condition. Another client makes a write that affects the results of that search.
children index_range_lock (Phantoms in the context of write skew require,special treatment, such as index-range locks)
parents write_skew, serializable_isolation
location knowledge/ddia.dz:963

index_range_lock
content Index Range Lock
parents phantom_reads, DDIA/glossary
location knowledge/ddia.dz:974

two_phase_locking
content Two-phase locking
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/serializable_transactions
location knowledge/ddia.dz:982

serializable_snapshot_isolation
content Serializable Snapshot Isolation (SSI): a relatively new algorithm that avoids many of the drawbacks of previous approaches. It uses an optimistic approach, allowing transactions to proceed without blocking. When a transaction attempts to commit, it is checked, and if it is not serializable, it is aborted.
parents DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/serializable_transactions
location knowledge/ddia.dz:991

partial_failure
content Partial Failure: in a distributed system, parts of a system that are broken in some unpredictable way.
children nondeterministic (non-determinism in distributed systems is what makes,distributed systems difficult), DDIA/toc/2_distributed_data/08_trouble_distributed_systems/partial_failures
parents DDIA/glossary
location knowledge/ddia.dz:1001

deterministic
content Deterministic: Describing a function that always produces the same output if you give it the the same input. This means it cannot depend on random numbers, the time of day, network communication, or other unpredictable things.
parents DDIA/glossary
location knowledge/ddia.dz:1006

nondeterministic
content Nondetermistic: describing a function that produces unpredictable output on the same input.
parents partial_failure
location knowledge/ddia.dz:1014

consistency
content Consistency
children causal_consistency, linearizability (A popular consistency model), DDIA/toc/2_distributed_data/09_consistency_consensus
parents DDIA/glossary
location knowledge/ddia.dz:1022

consensus
content Consensus: a fundamental problem in distributed computing, concerning getting several nodes to agree on something (for example, which node should be the leader for a database cluster).
children DDIA/toc/2_distributed_data/09_consistency_consensus
parents DDIA/glossary
flashcard (front) What is consensus?
flashcard (back) In distributed computing, consensus is the problem of getting nodes to agree on something.
location knowledge/ddia.dz:1026

linearizability
content Linearizability: behaving as if there was only a single copy of data in the system, which is updated by atomic operations.
children causal_consistency (CC does not have the coordination overhead of,linearizability, and is less sensitive to network,problems. It is a weaker consitency model to,linearizability), DDIA/toc/2_distributed_data/09_consistency_consensus
parents consistency
flashcard (front) What is "linearizability"?
flashcard (back) Linearizabilty is a consistency model which behaves as if there was only a single copy of data in the system, updated by atomic operations.
location knowledge/ddia.dz:1036

causal_consistency
content Causal Consistency: a consistency model that allows for things to be concurrent. This causes the timeline to contain branching and merging.
children DDIA/toc/2_distributed_data/09_consistency_consensus
parents linearizability, consistency
location knowledge/ddia.dz:1047

shared_register
content Shared Register: fundamental type of shared data structure in distributed systems with two operations: a read operation, and write operation. This is used to build shared-memory and message-passing systems.
children DDIA/toc/2_distributed_data/09_consistency_consensus/equivalent_consensus_problems/linearizable_compare_and_set_registers, compare_and_swap (linearizable compare-and-swap register)
parents DDIA/glossary
hyperlink https://en.wikipedia.org/wiki/Shared_register
flashcard (front) What is a shared register?
flashcard (back) A fundamental shared data structure in distributed systems that allows a read/write operation.
location knowledge/ddia.dz:1059

compare_and_swap
content Compare_and_swap: An atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation.
children DDIA/toc/2_distributed_data/09_consistency_consensus/equivalent_consensus_problems/linearizable_compare_and_set_registers
parents shared_register
hyperlink https://en.wikipedia.org/wiki/Compare-and-swap
flashcard (front) What is compare-and-swap?
flashcard (back) An atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a value, and will only modify the conents of that memory location if the value matches.
location knowledge/ddia.dz:1070

systems_of_record
content Systems Of Record: AKA source of truth, holds the authoritative version of your data. Each fact is represented exactly once.
children fact_table (fact table is a source of truth, and,therefore a system of record (I think?)), derived_data_system (vs), DDIA/toc/2_distributed_data/09_consistency_consensus/if_leader_fails/automatically_choose_new_leader
parents normalized, DDIA/glossary
flashcard (front) What is are "systems of record"?
flashcard (back) Systems of record are systems where there is one source of truth.
location knowledge/ddia.dz:1106

derived_data_system
content derived data system: system that takes some data from another system and transforms it in some way
children denormalize (derived data systems are typically denormalized), DDIA/toc/2_distributed_data/09_consistency_consensus/if_leader_fails/automatically_choose_new_leader
parents systems_of_record, DDIA/glossary
flashcard (front) What is a "derived data system"?
flashcard (back) A derived data system is one that takes some data from another system and transforms it in some way.
location knowledge/ddia.dz:1115

normalized
content normalized: structured in such a way that there is no redundancy or duplication. In a normalized database, when some piece of data changes, you only need to change it in once place, not many copies in many different places.
children denormalize (introduces some amount of redundancy in a normalized,dataset.), DDIA/references/database_normalization, systems_of_record (systems of record represent each fact exactly once,,meaning the data is usually normalized)
flashcard (front) What does "normalized" mean in the context of a database?
flashcard (back) "Normalized" refers to data structured in a way that there is no duplication.
location knowledge/ddia.dz:1125

denormalize
content denormalize: to introduce some amount of redundancy or duplication in a normalized dataset, typically in the form of a cache or index, in order to speed up reads. A denormalized value is a kind of precomputed query result, similar to a materialized view.
children materialize (a denormalized value is similar to a materialized view)
parents DDIA/glossary, derived_data_system, normalized
flashcard (front) What does it mean to "denormalize" in the context of data?
flashcard (back) To denormalize a dataset is to introduce some amount of redundancy or duplication in a normalized dataset.
location knowledge/ddia.dz:1147

fault_tolerant
content Fault Tolerant: able to recover automatically if something goes wrong, such as a crash or network failure.
children DDIA/toc/3_derived_data/10_batch_processing/problems_solved/fault_tolerance
parents DDIA/glossary, reliability
flashcard (front) what does it mean to be "fault tolerant"?
flashcard (back) A system that is fault tolerant is able to recover automatically if something goes wrong.
location knowledge/ddia.dz:1180

materialize
content Materialize: to perform a computation eagerly, and write out its result, as opposed to calculating it on demand when requested.
children DDIA/references/materialized_view
parents DDIA/glossary, denormalize
flashcard (front) What does it mean to "materialize" something?
flashcard (back) To materialize is to perform a computation in advance and write the results. Materialization is the process used to build a materialized view.
location knowledge/ddia.dz:1190

join
content Join: the process of combining rows of tables based on a common link or attribute between them. It is used to retrieve specific records that have a connection, such as a foreign key reference, and is commonly used in queries that need to retrieve related data.
children links/bloom_filters_sqlite (some discussion on joins), DDIA/toc/3_derived_data/mapreduce_join_algos
parents DDIA/glossary
flashcard (front) What is a join?
flashcard (back) A join operation brings together records that have something in common.
location knowledge/ddia.dz:1208

bounded
content Bounded: having some known upper limit or size.
children DDIA/toc/3_derived_data/10_batch_processing (input data to a distributed batch processing job is,bounded)
parents DDIA/glossary
location knowledge/ddia.dz:1220

durable
content Durable: storing data in such a way that you believe it will not be lost, even if various faults occur.
children DDIA/toc/2_distributed_data/07_transactions/meaning_of_ACID/durability, python/docs/stdlib/persistence (persistant on-disk state is used in the context of durability)
parents DDIA/glossary
location knowledge/ddia.dz:1224

star_schema
content star schema: typical formulaic style for how data warehouses are used
children DDIA/references/star_schema, fact_table, dimension_table
parents data_warehouse, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location knowledge/ddia.dz:1230

fact_table
content fact table: records measurements or metrics for a specific event
children DDIA/references/star_schema/fact_tables, dimension_table
parents databases/star_schema_benchmark/star_queries, systems_of_record, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, DDIA/glossary, star_schema
location knowledge/ddia.dz:1235

dimension_table
content Dimension Table: Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a large number of attributes to describe the fact data
children DDIA/references/star_schema/dimension_tables
parents DDIA/glossary, fact_table, databases/star_schema_benchmark/star_queries, star_schema