Quiz Review PDF - Database Concepts
Document Details

Uploaded by CharitableMoose3939
University of Virginia, Charlottesville
Tags
Summary
This document is a review of database concepts, covering definitions, purposes, and usages related to database management systems. It includes topics like data models, SQL operations, database architecture, and relational algebra, which are important for understanding and working with databases. The content is designed to help students prepare for a quiz on database concepts.
Full Transcript
The following content is considered "Fair game" for the Quiz. Database - definition, purpose, usages, ideal qualities, users and use cases Definition: an organized collection of related data that can be persisted beyond the program’s active running memory Purposes: data integrity,...
The following content is considered "Fair game" for the Quiz. Database - definition, purpose, usages, ideal qualities, users and use cases Definition: an organized collection of related data that can be persisted beyond the program’s active running memory Purposes: data integrity, security, collaboration, and backup and recovery capabilities Usages: ○ Operational: Collect modify maintain data Dynamic data ○ Analytical: store and track historical data Static data over a long period of time Ideal qualities: Queryable, Durable, Efficient, Changeable, Distributable, Failure tolerant, Secure, Minimal Redundancy Users + Use Cases: ○ Administrator (DBA): Supervisor of the system ○ Designers: Design schema for data in DB ○ Developers: Write programs that access and interact with database ○ Data Entry: Clean up or correct data ○ Analysts: Mine data for insights and usages Definitional terms from the Jan 15 and 17 slide deck including, but not limited to DBA: Database administrator, supervisor of the system Data model: a collection of concepts or notations for describing the data in a database ○ Structure: Definition of relations and contents ○ Integrity: Constraints on the DB’s contents ○ Manipulation: Actions that can be done on the DB’s contents Schema: the structure of the data (typically tables) including the names and data types of attributes in the data model Instance: a snapshot of the data stored in the database at a given time DDL vs DML ○ DDL: Data Definition Language Enables creation and modification of a database schema, but not data CREATE, DROP, RENAME, ALTER ○ DML: Data Manipulation Language Manipulate data itself, not the structure (SELECT, INSERT, UPDATE, DELETE) Physical and Logical data independence ○ Physical Data independence: the ability to change how data is stored physically without affecting the applications that use it ○ Logical Data independence; the ability to change the schema of a data without affecting the applications that use it Transaction: a series of one or more operations on the database that describe a single "high-level" action ACID ○ Atomic All commands in a transaction are treated as a single unit One command fails, entire transaction fails ○ Consistency Database made by a transaction must remain in a consistent state If transaction leads to inconsistency, then transaction is rejected ○ Isolation Concurrent transactions would result in the same database state as the same transactions in sequence ○ Durability Changes due to a transaction are persistent, and won’t be lost if system failure occurs DBMS: Database management system, manages data, engine, schema ○ Engine: Means of accessing/Modifying data ○ Schema: definition of the structure of data Database Architecture What "levels" of the architecture exist in a typical DBMS ○ External level: “the UI” (created by external schema); different users see different views Different contents, or same contents, different representations External view: a collection of external records External record: a record seen by a particular user ○ Logical level: includes description of all data available to be shared Logical schema: a complete description of the information content of the database Logical record interface: conceptual level and internal level (defines what is visible or not to external and physical level) Logical model: a collection of logical records ○ Internal + Physical level: physical implementation of the database, responsible by DBMS Internal schema: complete description of the internal model ○ Which do our users primarily interact with External level ○ Which do developers typically interact with Logical level What is the Relational model ○ A single way to represent data as a two dimensional table called a relation ○ Composed of structure, integrity, manipulation ○ Schema: logical design of the database, doesn’t typically change - attribs order NoSQL models will not be on the quiz, but you should be familiar with what NoSQL entails ○ Data stored in a non-tabular format Relational Model terms Mathematical names - Relation, attribute, tuple Relation: an unordered set of tuples that contain the relationship of attributes Attribute: corresponds to a category or property of the relation Tuple:set of attribute values in the relation Typical SQL names - Table, column, row ○ Which ties to which, and how are real SQL tables different from mathematical relationships ○ Table = relation, attribute = column, row = tuple ○ SQL allow duplicates unless primary key or unique are applied ○ SQL has more flexibility - like not null Types of keys ○ Super, Candidate, and Primary, and how they are different. Super: Any attribute that can uniquely identify a tuple (can be a combination of multiple attributes) Candidate: a minimal super key - minimal meaning a single attribute Primary key: a candidate key that is the most important key ○ Foreign keys: specifies that an attribute from one relation has to map to a tuple in another relation - attributes that uniquely identify a row in another table ○ What elements describe a "good" primary key? What could indicate a bad primary key? Good: Unique, not NULL, meaningful, every tuple has it, non changing Bad primary keys could be repetitive, non unique, etc. SQLite Be familiar with the following ○ CREATE TABLE CREATE TABLE (if not exists) _table name_ ( _column name_ _data type_ _PRIMARY KEY_, _column name_ _data type_ etc, _optional UNIQUE_ (column names) ) _optional STRICT_ ; Be familiar with basic datatypes (TEXT, INTEGER, REAL, BLOB, NULL) Text: text Integer: int Real: floating point Blob: sequence of bytes (images, files, etc) Be familiar with the following constraints - PRIMARY KEY, UNIQUE, NOT NULL PRIMARY KEY indicates the primary key, typically an integer, Unique AND not null UNIQUE: indicates that the attribute must be unique NOT NULL: indicates the attribute must have something in it ○ SELECT SELECT * FROM _table name_; - gets all data SELECT _attibute_, FROM _table name_; SELECT * FROM _table name_ WHERE _attribute_ = _constraint; ○ INSERT INTO INSERT INTO _table name_ (column names) VALUES (values), (values); ○ UPDATE UPDATE _table name_ SET _attribute_ = _new value_ WHERE _constraint_ Changes the value of the field of the attribute where criteria is met ○ DELETE DELETE FROM _table name_ WHERE _constraint Removes a row from the table ○ DROP TABLE Deletes an entire table (including the schema) ○ BEGIN Type begin to start a transaction ○ ROLLBACK Type rollback to revert to last saved commit ○ COMMIT Creates a save point of all the transactions previously done ○ You *don't* need to know the Foreign key syntax or JOIN syntax Any SQLite stuff will be single table only ER Diagrams Be familiar with ER Diagrams using the style covered in class, including ○ Entity-sets What is an Entity vs. Entity Set Entity: an individual object or thing within the domain that is distinguishable from other objects Entity set: a collection of similar types of entities Entity is to entity set as instance is to class Strong and weak Strong entity set: Entities can be identified by the values of their attributes (a primary key) Weak entity set: has a composite primary key, foreign key from strong entity set gives the weak entity set meaning ○ Always has total participation in relationship with strong entity ○ Relationships (binary, unary, ternary) Cardinality (many-to-many, many-to-one, one-to-one) Arrow always points to the one Total participation Every entity participates in the relationship - marked by 2 lines Unary vs Binary vs Ternary (Or n-ary as a generalization) - which is most common? Binary relations are typically the most common Converting Ternary to Binary ○ Attributes How do we represent multivalue, composite, and derived Multivalue: marked by a double circle Derived: marked by a dotted circle Composite: attributes stemmed off of another attribute ○ Subclassing When is it useful? Useful when entities has special properties not associated with all members of the set (ex. Residential vs business) Be familiar with the meaning of Generalization and Specialization Generalization: The triangle upwards from the specialization to the more general entity (the parent class) Specialization: the subclass itself, the triangle downwards from the generalization ER Diagram to Relational Schema ○ For a given ER Diagram, design a database schema Relational Algebra Be ready to both read and write the following relational expressions ○ For example: "Given this relational model…. "...write a relational algebra expression to implement this query" "...and this relational algebra expression, what is the output?" Operations to be familiar with (including the syntax/symbols use) ○ Selection σp (R) ○ Projection πA1,A2,...An(R) ○ Renaming ρA1, A2,...An(R) ○ Union R ∪S ○ Intersection R ∩ S ○ Difference R - S ○ Cross product (aka Cartesian Product) R × S ○ Natural Join (and its relation to Cross Product) First a cross product, then check equality of all common attributes, then eliminate duplicate common attributes With natural join, be familiar with INNER, LEFT, RIGHT, and OUTER R ⋈ S - natural join. For outer joins, use ⟕ (left), ⟖ (right), and ⟗ (full outer) ○ Division R ÷ S Eliminate everything other than what’s leftover from what you’re dividing ○ Assignment R ←S ○ Aggregation G1, G2, G3…GmGF1(A1), F2(A2)...Fn(An) -> name (R) Assume you'll only see "sum", "max", "min", "average" and "count" Functional Independence- note this section will change depending on how much is covered on Monday and Wednesday - class content from Monday (Feb 10) and Wednesday (Feb 12) are "fair game". What are some signs of poorly implemented functional independence? What can go wrong if we don't fix it? ○ Redundancy, tight coupling, and lack of clear ownership can be signs of poorly implemented functional dependence ○ If not fixed, you could delete the entire tuple and lose valuable data, be left with inconsistencies or slow updates, and have data redundancies Types of anomalies ○ Redundancy: part of data can be derived from other parts ○ Update: Slow updates or inconsistent updates could occur ○ Insertion: what if you want to insert something, but you’re missing information ○ Deletion: To delete something you might have to delete the entire tuple when you only want to clear one attribute Functional Dependencies ○ How they can be identified: Look for patterns - if one value in one column consistently determines the value in another column across all rows ○ How we can "handle" them in database design: Normalization, Decomposition (break large tables into smaller ones based on FDs), preserve dependencies ○ Expression mathematically (using ➔ symbol) i.e. A ➔ BD A determines/implies BD Mathematical rules around FDs ○ Reflexivity if b subset a, a -> b ○ Augmentation if a -> b, then ac -> bc ○ Transitivity if a -> b and b -> c, a -> c ○ Psuedo-Transitivity if a -> b and cb ->d, ac->d ○ Union if a->b and a->c, then a -> bc ○ Decomposition if a -> bc, then a -> b and a -> c Attribute Closure F+ (find all FDs that are implied by F) Attribute Closure (α+) = all FDs a particular attribute can imply Closure of F (F+) = a set of all FDs that are implied by F ○ LHS and remaining, copy FDs as is, apply reflexivity, apply transivity Determine Super Keys, Candidate Keys ○ Super key: a key that implies all of the possible attributes ○ Candidate key: a minimal super key Data normalization - note this section will change depending on how much is covered on Monday and Wednesday - class content from Monday (Feb 10) and Friday (Feb 14) are "fair game". "Database design is about characterizing data and organizing data" ○ Why do the bold portions mean? ○ Characterizing data: how to describe properties we know or see in the data ○ Organizing data: how to organize data to promote ease of use and efficiency Normalization: a technique of organizing data in a database Relation to functional independence? ○ Normalization tries to eliminate redundant data, and ensure data dependencies make sense ○ Refines schema How to use decomposition to break up dependencies ○ Replace ABCD with [AB and BCD] or [ACD and ABD] ○ What is the trade-off to this? Some queries become more expensive, maybe you can’t reconstruct the original relation, checking dependencies may require joining the decomposed relations Lossless join: After decomposing tables and adding them back together, no information is lost ○ Intersection of decomposed tables must contain a candidate key Dependency preserving: every dependency is in the same relation Normal Forms ○ 1nf, 2nf, 3nf, BCNF Don't worry about "higher order" (4nf+) normal forms ○ 1nf: every attribute has a single, atomic value Values stored in a column should be of the same type (domain) Order doesn’t matter Ex. “department” can’t have value “Computer Science, Math”, instead gets split up into two entries of “computer science” and “math” ○ 2nf: 1nf + no partial dependency (FDs) Decompose to get rid of partial dependencies A partial dependency occurs when a non-prime attribute depends on only part of a candidate key, rather than the whole candidate key May still be redundant ○ 3nf: 2nf + lossless join + dependency preserving Left hand side is a superkey or the right consists of prime attribs only Get rid of transitive dependencies, but ensure it’s still in the same relation *Note, may have to break up “AB” to ensure no possible transitive* ○ BCNF: 1NF + lossless-join + redundant free For every non-trivial dependency, X->A, X is a superkey Remove any non-key (AB -> C, C->B), ensure C is a superkey for B by decomposing it to its own table Can lose functional dependencies, but ensure lossless