DDIA/glossary

glossary

dz / DDIA / glossary

Node Tree

Nodes

functional_requirements
content	Functional requirements: what it should do, such as allowing data to be sored, retrieved, searched, and processed in various ways.
children	system_design_interview/glossary/functional_requirement
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location	knowledge/ddia.dz:287

nonfunctional_requirements
content	nonfunctional requirements: general properties like security, reliablity, compliance, scalability, compatability, maintainability
children	system_design_interview/glossary/non_functional_requirement
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location	knowledge/ddia.dz:294

reliability
content	Reliability: making systems work correctly, even when faults occur.
children	fault_tolerant
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location	knowledge/ddia.dz:301

scalability
content	Scalability: having strategies for keeping performance good, even when load increases.
parents	DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable, DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location	knowledge/ddia.dz:307

maintainability
content	Maintainability: making life better for engineering and operations teams who need to work with the system.
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/01_reliable_scalable_maintainable
location	knowledge/ddia.dz:313

relational_database
content	Relational Database: invented to solve "many-to-many" problem
children	system_design_interview/glossary/RDBMS
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location	knowledge/ddia.dz:319

nosql
content	NoSQL Datastores
children	graph_database, document_database
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location	knowledge/ddia.dz:324

document_database
content	Document Database: targets use cases where data comes in self contained documents and relationships between one document and another are rare.
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages, nosql
location	knowledge/ddia.dz:329

graph_database
content	Graph Database: useful for cases where data where anything is potentially related to everything
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages, nosql
location	knowledge/ddia.dz:337

data_model
content	Data Model
children	system_design_interview/glossary/data_model
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location	knowledge/ddia.dz:344

query_language
content	Query Language
children	DDIA/tools/XSL_XPath (not a DB query language, but intersting parallel), DDIA/tools/datalog, DDIA/tools/CSS (not a DB query language, but intersting parallel), DDIA/tools/SQL, DDIA/tools/MapReduce, DDIA/tools/SPARQL, DDIA/tools/cypher, DDIA/tools/monogdb_aggregration_pipeline
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location	knowledge/ddia.dz:349

sequence_similarity_search
content	Sequence Similarity Search: taking one long string (such as a DNA molecule), and matching it against a large database of strings that are similar, but not identical
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location	knowledge/ddia.dz:402

full_text_search
content	Full Text Search: arguably a kind of data model used alongside databases.
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/02_data_models_query_languages
location	knowledge/ddia.dz:409

OLTP
content	OLTP: Online Transaction Processing, optimized for transaction processing.
children	links/bloom_filters_sqlite (SQLite is a general-purpose DB, but excels at OLTP workloads), log_structured, update_in_place
parents	DDIA/glossary, storage_engine, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
flashcard (front)	What is an OLTP databse?
flashcard (back)	Online transaction processing, optimized for transaction processing
location	knowledge/ddia.dz:415

OLAP
content	OLAP: Online analytical processing, optimized for analytical processing
children	data_warehouse, links/bloom_filters_sqlite (Researchers at buffalo university in 2015 found that,most queries are simple KV lookups and OLAP queries), column_oriented_storage
parents	DDIA/glossary, storage_engine, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
flashcard (front)	What is an OLAP database?
flashcard (back)	Online analytical processing.
location	knowledge/ddia.dz:425

storage_engine
content	Storage Engine
children	OLTP, OLAP
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location	knowledge/ddia.dz:434

data_warehouse
content	Data Warehouse
children	star_schema
parents	DDIA/glossary, OLAP, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location	knowledge/ddia.dz:439

log_structured
content	Log-structured storage engine: only permits appending to files and deleting obsolete files, but never updates a file that has been written.
children	SSTable, DDIA/tools/cassandra, DDIA/tools/bitcask, LSM_tree, DDIA/tools/levelDB, DDIA/tools/lucene, DDIA/tools/hbase
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, OLTP
location	knowledge/ddia.dz:445

SSTable
content	SSTable: Sorted String Table
parents	DDIA/glossary, log_structured, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location	knowledge/ddia.dz:459

LSM_tree
content	LSM_tree: Log-Structured Merge Tree
parents	DDIA/glossary, log_structured, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location	knowledge/ddia.dz:465

update_in_place
content	Update-in-place storage engine: treats disk as set of fixed-size pages that can be overwritten
children	btree
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, OLTP
location	knowledge/ddia.dz:495

btree
content	B-Tree
parents	DDIA/glossary, update_in_place, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location	knowledge/ddia.dz:502

column_oriented_storage
content	column oriented storage: aims to encode data very compactly, and minimize amount of data query needs to read from disk
parents	DDIA/glossary, OLAP, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location	knowledge/ddia.dz:508

rolling_upgrade
content	Rolling Upgrade: a new version of a service is gradually deployed to a few nodes at a time, rather than deploying to all nodes simultaneously.
parents	DDIA/glossary, evolvability, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location	knowledge/ddia.dz:516

evolvability
content	Evolvability: the ease of making changes in an application
children	rolling_upgrade
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location	knowledge/ddia.dz:523

binary_schema
content	binary schema driven formats
children	DDIA/tools/thrift, DDIA/tools/avro, DDIA/tools/protocol_buffers
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
remarks	useful for documentation and code generation, but data needs to be decoded before it is human readable
location	knowledge/ddia.dz:544

REST
content	REST API
children	system_design_interview/glossary/REST
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location	knowledge/ddia.dz:569

RPC
content	RPC API
parents	DDIA/glossary, DDIA/toc/1_foundations_of_data_systems/04_encoding_evolution
location	knowledge/ddia.dz:574

high_availability
content	High Availability: keeping the system running, even one when machine (or several machines) goes down
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location	knowledge/ddia.dz:579

disconnected_operation
content	Disconnected Operation: Allowing an application to continue working when there is a network interruption
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location	knowledge/ddia.dz:587

latency
content	Latency
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication, DDIA/toc/2_distributed_data/05_replication/reasons_for_replication
location	knowledge/ddia.dz:595

single_leader_replication
content	Single-leader replication: Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/approaches_to_replication
location	knowledge/ddia.dz:605

multi_leader_replication
content	Multi-leader replication: clients send each write to one of several leader nodes, any of which can accept writes. The leaders send streams of data change events to each other and to any follower nodes.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/approaches_to_replication
location	knowledge/ddia.dz:614

leaderless_replication
content	Leaderless Replication: clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data.
location	knowledge/ddia.dz:622

replication_lag
content	Replication Lag: the delay between a write happening on the leader and being reflected on the follower
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
location	knowledge/ddia.dz:627

read_after_write_consistency
content	Read-after-write consistency: users should always see data that they submitted themselves.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
flashcard (front)	What is "Read After Write Consistency"?
flashcard (back)	In replication, read-after-write consistency is the idea that users should always see data that they submitted themselves.
location	knowledge/ddia.dz:635

monotonic_reads
content	Monotonic Reads: after users have seen the data at one point in time, they shouldn't later see the data from some earlier point in time
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
flashcard (front)	What are "Monotonic Reads"?
flashcard (back)	In replication, monotonic reads are the idea that if a user sees data from some point in time, they shouldn't later see the data from an earlier point in time (time monotonically increasing)
location	knowledge/ddia.dz:645

consistent_prefix_reads
content	Consistent prefix reads: users should see data in a state that makes causal sense: for example, seeing a question and its reply in the correct order.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/05_replication/consistency_models
flashcard (front)	What are "consistent prefix reads"?
flashcard (back)	In replication consistency models, consistent prefix reads state that users should see data in a state that makes causal sense (ex: question then answer).
location	knowledge/ddia.dz:657

partitioning
content	Partitioning: splitting up a large dataset or computation that is too big for a single machine into smaller parts and spreading them across several machines.
children	DDIA/toc/3_derived_data/10_batch_processing/problems_solved/partitioning, sharding (AKA), vbucket (AKA), system_design_interview/glossary/partition, tablet_bigtable (AKA), region_hbase (AKA), vnode (AKA)
parents	DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning
flashcard (front)	What is partitioning?
flashcard (back)	Partitioning is the act of splitting up a large dataset or computation for a single machine into smaller parts and spreading them.
location	knowledge/ddia.dz:668

sharding
content	Sharding
parents	DDIA/glossary, partitioning
location	knowledge/ddia.dz:680

region_hbase
content	Region: a term for a partition in HBase
parents	DDIA/glossary, DDIA/tools/hbase, partitioning
location	knowledge/ddia.dz:686

tablet_bigtable
content	tablet: in Bigtable, a name for a partition
parents	DDIA/tools/bigtable, partitioning
location	knowledge/ddia.dz:694

vnode
content	vnode: a term for "partition" in cassandra and riak
parents	DDIA/glossary, DDIA/tools/riak, DDIA/tools/cassandra, partitioning
location	knowledge/ddia.dz:700

wide_column_store
content	wide-column store: A wide-column store is a type of NoSQL database that uses tables, rows, and columns but allows column names and formats to vary. It can be considered a two-dimensional key-value store. Google's Bigtable is a classic example of a wide-column store.
parents	DDIA/glossary, DDIA/tools/bigtable
location	knowledge/ddia.dz:716

vbucket
content	vbucket: term for a partition in Couchbase
parents	DDIA/tools/couchbase, DDIA/glossary, partitioning
location	knowledge/ddia.dz:739

key_range_partitioning
content	Key-range partitioning: involves sorting keys, with a partition owning all keys between a minimum and maximum value. This method enables efficient range queries, but it can lead to hotspots if the application frequently accesses keys near each other in the sorted order.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/main_approaches
flashcard (front)	What is key-range partitioning?
flashcard (back)	Key-range partitioning involves sorting keys, with a partition owning all keys between a minimum and maximum value. This method enables efficient range queries, but it can lead to hotspots if the application frequently accesses keys near each other
location	knowledge/ddia.dz:758

hash_partitioning
content	Hash Partitioning: involves applying a hash function to each key, resulting in a partition owning a range of hashes. While this method can destroy the ordering of keys and make range queries inefficient, it can also help distribute load more evenly across the partitions.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/main_approaches
flashcard (front)	What is hash partitioning?
flashcard (back)	Hash Partitioning: involves applying a hash function to each key, resulting in a partition owning a range of hashes.
location	knowledge/ddia.dz:774

document_partitioned_indexes
content	Document Partitioned Indexes (local indexes): involve storing secondary indexes in the same partition as the primary key and value. This approach reduces reduces updates to a single partition on write, but a read of the secondary index requires a scatter/gather across all partitions, increasing the overall operation's complexity.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes
flashcard (front)	What are document partitioned indexes?
flashcard (back)	Document Partitioned Indexes (local indexes) store secondary indexes in the same partition as the primary key and value, reducing the need to update partitions on write. However, reads of secondary indexes require a scatter/gather across all partitions.
location	knowledge/ddia.dz:787

term_partitioned_indexes
content	Term-partioned indexes: store secondary indexes separately, using the indexed value. An entry in the secondary index may include records from all partitions of the primary key. When a document is written, several partitions of the secondary index need to be updated, but a read can be served from a single partition.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes
flashcard (front)	What is a term-partitioned index?
flashcard (back)	Term-partioned indexes store secondary indexes in separate partitions, using the indexed value. An entry in the secondary index may include records from multiple partitions of the primary key, and updates to a document may require updating several partitions, while reads can be served from a single partition.
location	knowledge/ddia.dz:804

secondary_index
content	Secondary Index: A database index created on one or more columns that are not the primary key, designed to improve query performance by enabling faster data retrieval on non-primary key fields. Unlike the primary index, multiple secondary indexes can exist on a single table, allowing quick searches on various columns at the cost of slight overhead during data modifications.
children	DDIA/tools/sqlite/create_index
parents	DDIA/toc/2_distributed_data/06_partitioning/secondary_indexes
location	knowledge/ddia.dz:822

transaction
content	Transaction: Grouping together several reads and writes into a logical unit, in order to simplify error handling and concurrency issues.
children	transaction_abort, dirty_writes (Almost all transaction types prevent dirty writes.)
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions
flashcard (front)	What is a transaction?
flashcard (back)	A transaction groups together several reads and writes into a logical unit, in order to simplify error handling and concurrency issues.
location	knowledge/ddia.dz:850

transaction_abort
content	Transaction abort: occurs when a database transaction is interrupted or terminated prematurely. A large class of errors related to software and hardware can be reduced town to transaction aborts.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions, transaction
location	knowledge/ddia.dz:861

read_committed
content	Read Committed
children	dirty_reads (The read-committed isolation level and stronger,levels prevent dirty reads, ensuring that data is,consistent and accurate.)
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels
location	knowledge/ddia.dz:870

snapshot_isolation
content	Snapshot isolation
children	lost_updates (Some implemenations of snapshot isolation prevent,this anomaly automatically, while others require,a manual lock.), read_skew (Most commonly prevented with snapshot isolation, which,allows a transaction to read froma consistent snapshot,at one point in time.), repeatable_read (AKA)
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels
location	knowledge/ddia.dz:876

repeatable_read
content	Repeatable Read
parents	DDIA/glossary, snapshot_isolation
location	knowledge/ddia.dz:882

serializable_isolation
content	Serializable Isolation
children	phantom_reads (Serializable isolation prevents only,straightforward phantom reads), write_skew (Only serializable isolation prevents write skew)
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/isolation_levels
location	knowledge/ddia.dz:888

dirty_reads
content	Dirty Reads: One client reads another client's uncommitted writes before they have been committed.
parents	DDIA/glossary, read_committed, DDIA/toc/2_distributed_data/07_transactions/race_conditions
flashcard (front)	What are dirty reads?
flashcard (back)	Dirty reads are a race condition that occurs when one client reads another client's uncommitted writes before they have been committed.
location	knowledge/ddia.dz:893

dirty_writes
content	Dirty Writes: one client overwrites data that another client has written, but not yet committed.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/race_conditions, transaction
flashcard (front)	What are dirty writes?
flashcard (back)	One client overwrites data that another client has written, but not yet comitted.
location	knowledge/ddia.dz:907

read_skew
content	Read Skew: a client sees different parts of the database at different points in time.
children	non_repeatable_reads (AKA)
parents	DDIA/toc/2_distributed_data/07_transactions/race_conditions, DDIA/glossary, MVCC, snapshot_isolation
location	knowledge/ddia.dz:918

MVCC
content	MVCC: Multi-version concurrency control
children	read_skew (snapshot isolation for fixing read skew is usually,implemented using multi-version concurrency control)
parents	DDIA/glossary
location	knowledge/ddia.dz:931

non_repeatable_reads
content	non-repeatable reads
parents	read_skew
location	knowledge/ddia.dz:935

lost_updates
content	Lost Update: two clients concurrenctly perform a read-modify-write-cycle. One overwrites the other's write without incorporating its changes, so data is lost.
parents	DDIA/toc/2_distributed_data/07_transactions/race_conditions, DDIA/glossary, snapshot_isolation
location	knowledge/ddia.dz:940

write_skew
content	Write Skew: A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true.
children	phantom_reads (phantoms in the context of write skew require,special treatment, such as index range locks)
parents	DDIA/toc/2_distributed_data/07_transactions/race_conditions, serializable_isolation, DDIA/glossary
location	knowledge/ddia.dz:952

phantom_reads
content	A transaction reads objects that match some search condition. Another client makes a write that affects the results of that search.
children	index_range_lock (Phantoms in the context of write skew require,special treatment, such as index-range locks)
parents	write_skew, serializable_isolation
location	knowledge/ddia.dz:963

index_range_lock
content	Index Range Lock
parents	phantom_reads, DDIA/glossary
location	knowledge/ddia.dz:974

two_phase_locking
content	Two-phase locking
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/serializable_transactions
location	knowledge/ddia.dz:982

serializable_snapshot_isolation
content	Serializable Snapshot Isolation (SSI): a relatively new algorithm that avoids many of the drawbacks of previous approaches. It uses an optimistic approach, allowing transactions to proceed without blocking. When a transaction attempts to commit, it is checked, and if it is not serializable, it is aborted.
parents	DDIA/glossary, DDIA/toc/2_distributed_data/07_transactions/serializable_transactions
location	knowledge/ddia.dz:991

partial_failure
content	Partial Failure: in a distributed system, parts of a system that are broken in some unpredictable way.
children	nondeterministic (non-determinism in distributed systems is what makes,distributed systems difficult), DDIA/toc/2_distributed_data/08_trouble_distributed_systems/partial_failures
parents	DDIA/glossary
location	knowledge/ddia.dz:1001

deterministic
content	Deterministic: Describing a function that always produces the same output if you give it the the same input. This means it cannot depend on random numbers, the time of day, network communication, or other unpredictable things.
parents	DDIA/glossary
location	knowledge/ddia.dz:1006

nondeterministic
content	Nondetermistic: describing a function that produces unpredictable output on the same input.
parents	partial_failure
location	knowledge/ddia.dz:1014

consistency
content	Consistency
children	causal_consistency, linearizability (A popular consistency model), DDIA/toc/2_distributed_data/09_consistency_consensus
parents	DDIA/glossary
location	knowledge/ddia.dz:1022

consensus
content	Consensus: a fundamental problem in distributed computing, concerning getting several nodes to agree on something (for example, which node should be the leader for a database cluster).
children	DDIA/toc/2_distributed_data/09_consistency_consensus
parents	DDIA/glossary
flashcard (front)	What is consensus?
flashcard (back)	In distributed computing, consensus is the problem of getting nodes to agree on something.
location	knowledge/ddia.dz:1026

linearizability
content	Linearizability: behaving as if there was only a single copy of data in the system, which is updated by atomic operations.
children	causal_consistency (CC does not have the coordination overhead of,linearizability, and is less sensitive to network,problems. It is a weaker consitency model to,linearizability), DDIA/toc/2_distributed_data/09_consistency_consensus
parents	consistency
flashcard (front)	What is "linearizability"?
flashcard (back)	Linearizabilty is a consistency model which behaves as if there was only a single copy of data in the system, updated by atomic operations.
location	knowledge/ddia.dz:1036

causal_consistency
content	Causal Consistency: a consistency model that allows for things to be concurrent. This causes the timeline to contain branching and merging.
children	DDIA/toc/2_distributed_data/09_consistency_consensus
parents	linearizability, consistency
location	knowledge/ddia.dz:1047

shared_register
content	Shared Register: fundamental type of shared data structure in distributed systems with two operations: a read operation, and write operation. This is used to build shared-memory and message-passing systems.
children	DDIA/toc/2_distributed_data/09_consistency_consensus/equivalent_consensus_problems/linearizable_compare_and_set_registers, compare_and_swap (linearizable compare-and-swap register)
parents	DDIA/glossary
hyperlink	https://en.wikipedia.org/wiki/Shared_register
flashcard (front)	What is a shared register?
flashcard (back)	A fundamental shared data structure in distributed systems that allows a read/write operation.
location	knowledge/ddia.dz:1059

compare_and_swap
content	Compare_and_swap: An atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation.
children	DDIA/toc/2_distributed_data/09_consistency_consensus/equivalent_consensus_problems/linearizable_compare_and_set_registers
parents	shared_register
hyperlink	https://en.wikipedia.org/wiki/Compare-and-swap
flashcard (front)	What is compare-and-swap?
flashcard (back)	An atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a value, and will only modify the conents of that memory location if the value matches.
location	knowledge/ddia.dz:1070

systems_of_record
content	Systems Of Record: AKA source of truth, holds the authoritative version of your data. Each fact is represented exactly once.
children	fact_table (fact table is a source of truth, and,therefore a system of record (I think?)), derived_data_system (vs), DDIA/toc/2_distributed_data/09_consistency_consensus/if_leader_fails/automatically_choose_new_leader
parents	normalized, DDIA/glossary
flashcard (front)	What is are "systems of record"?
flashcard (back)	Systems of record are systems where there is one source of truth.
location	knowledge/ddia.dz:1106

derived_data_system
content	derived data system: system that takes some data from another system and transforms it in some way
children	denormalize (derived data systems are typically denormalized), DDIA/toc/2_distributed_data/09_consistency_consensus/if_leader_fails/automatically_choose_new_leader
parents	systems_of_record, DDIA/glossary
flashcard (front)	What is a "derived data system"?
flashcard (back)	A derived data system is one that takes some data from another system and transforms it in some way.
location	knowledge/ddia.dz:1115

normalized
content	normalized: structured in such a way that there is no redundancy or duplication. In a normalized database, when some piece of data changes, you only need to change it in once place, not many copies in many different places.
children	denormalize (introduces some amount of redundancy in a normalized,dataset.), DDIA/references/database_normalization, systems_of_record (systems of record represent each fact exactly once,,meaning the data is usually normalized)
flashcard (front)	What does "normalized" mean in the context of a database?
flashcard (back)	"Normalized" refers to data structured in a way that there is no duplication.
location	knowledge/ddia.dz:1125

denormalize
content	denormalize: to introduce some amount of redundancy or duplication in a normalized dataset, typically in the form of a cache or index, in order to speed up reads. A denormalized value is a kind of precomputed query result, similar to a materialized view.
children	materialize (a denormalized value is similar to a materialized view)
parents	DDIA/glossary, derived_data_system, normalized
flashcard (front)	What does it mean to "denormalize" in the context of data?
flashcard (back)	To denormalize a dataset is to introduce some amount of redundancy or duplication in a normalized dataset.
location	knowledge/ddia.dz:1147

fault_tolerant
content	Fault Tolerant: able to recover automatically if something goes wrong, such as a crash or network failure.
children	DDIA/toc/3_derived_data/10_batch_processing/problems_solved/fault_tolerance
parents	DDIA/glossary, reliability
flashcard (front)	what does it mean to be "fault tolerant"?
flashcard (back)	A system that is fault tolerant is able to recover automatically if something goes wrong.
location	knowledge/ddia.dz:1180

materialize
content	Materialize: to perform a computation eagerly, and write out its result, as opposed to calculating it on demand when requested.
children	DDIA/references/materialized_view
parents	DDIA/glossary, denormalize
flashcard (front)	What does it mean to "materialize" something?
flashcard (back)	To materialize is to perform a computation in advance and write the results. Materialization is the process used to build a materialized view.
location	knowledge/ddia.dz:1190

join
content	Join: the process of combining rows of tables based on a common link or attribute between them. It is used to retrieve specific records that have a connection, such as a foreign key reference, and is commonly used in queries that need to retrieve related data.
children	links/bloom_filters_sqlite (some discussion on joins), DDIA/toc/3_derived_data/mapreduce_join_algos
parents	DDIA/glossary
flashcard (front)	What is a join?
flashcard (back)	A join operation brings together records that have something in common.
location	knowledge/ddia.dz:1208

bounded
content	Bounded: having some known upper limit or size.
children	DDIA/toc/3_derived_data/10_batch_processing (input data to a distributed batch processing job is,bounded)
parents	DDIA/glossary
location	knowledge/ddia.dz:1220

durable
content	Durable: storing data in such a way that you believe it will not be lost, even if various faults occur.
children	DDIA/toc/2_distributed_data/07_transactions/meaning_of_ACID/durability, python/docs/stdlib/persistence (persistant on-disk state is used in the context of durability)
parents	DDIA/glossary
location	knowledge/ddia.dz:1224

star_schema
content	star schema: typical formulaic style for how data warehouses are used
children	DDIA/references/star_schema, fact_table, dimension_table
parents	data_warehouse, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval
location	knowledge/ddia.dz:1230

fact_table
content	fact table: records measurements or metrics for a specific event
children	DDIA/references/star_schema/fact_tables, dimension_table
parents	databases/star_schema_benchmark/star_queries, systems_of_record, DDIA/toc/1_foundations_of_data_systems/03_storage_and_retrieval, DDIA/glossary, star_schema
location	knowledge/ddia.dz:1235

dimension_table
content	Dimension Table: Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a large number of attributes to describe the fact data
children	DDIA/references/star_schema/dimension_tables
parents	DDIA/glossary, fact_table, databases/star_schema_benchmark/star_queries, star_schema