Summary

These notes provide a basic overview of database systems. They cover topics like database architecture, query languages (DDL, DML, DCL), and database administration tasks. Database taxonomy and DBMS functions are also discussed.

Full Transcript

NOTATKI ZE SLAJDÓW database system​= external storage + DBMS + query language DBMS ​ - software that manages access to databases // not only manages the access but mainly manages the database system overall Query Language = DDL + DML + DCL D...

NOTATKI ZE SLAJDÓW database system​= external storage + DBMS + query language DBMS ​ - software that manages access to databases // not only manages the access but mainly manages the database system overall Query Language = DDL + DML + DCL DDL ​(​Data Definition Language​) – defines data structures to be manipulated on by DML DML ​(​Data Manipulation Language​) – performs operations (fetching and changing) on data defined by DDL DCL ​(​Data Control Language​)​– ​controls access to data (e.g. security, concurrency, …) Isolating the database system architecture levels (logical, internal, external) allows to achieve independence of a program from data: Logical data independence - resistance of external schemas (views) to changes made in the conceptual schemas (like ERD) Physical data independence - resistance of a conceptual schema to changes made in the internal (physical) schema (like disk files). Architecture of a database system: External (user) level ​- A set of external views (external schemas) through which users see the contents of the database. The view is an abstracted fragment of the database, defined in the terms of the query language. Logical (conceptual, model) level ​ - Contains a conceptual model (conceptual scheme) of the database. This model covers the entire contents of the database (relations, objects, …) as seen by the database administrator or database owner. Internal (physical) level ​- The physical representation of the database in the form of disk files and algorithms to access them. It is a low-level representation (level of memory units, e.g. bytes or disk pages). When creating a new table in the database you need to specify: name of the table (unique in the database) column names (unique in the table) data type of each column. Database taxonomy​- grouped by number of nodes or data model. Number of nodes: centralized distributed federated Data model: relational object-oriented, object-relational semi-structural NoSQL DBMS functions: data access data integrity concurrency control (resolving conflicts during simultaneous access to the same data) data security recovery of data (performing restart, restoring data from backups) data distribution (if one database is physically placed on different computer systems) parallel data processing Database Dictionary ​- a special database containing information about data (metadata) stored in other (utility) databases. It contains: descriptions of the structure of database objects (e.g. table and views schemas,...) user descriptions (identifiers, permissions,...) definitions of data formats used in the database statistical data on data values in the database, used for query optimization Access to it: An ​ordinary user has only ​read-only permissions, restricted to certain parts of Data Dictionary DBA can have ​wider access to the Data Dictionary, but its modification can perform only through system procedures DBMS has ​full access to the Data Dictionary (e.g. it modifies DD when creating a new object in the database) DBA (DataBase Administrator) ​- ​A person that controls the access to the database and its functioning. DBA Tasks: giving users and groups of users access to the database (permissions to create database objects, fetching and modify them, etc.) monitoring the performance of the database defining the strategy of archiving and recovering data after failures launching new data access strategies DBA Tools: programs to load a database from external files data archiving programs programs for restoring data from the archive programs for collecting statistics and analyzing them procedures for defining users and user groups programs for operating on a Data Dictionary programs for analyzing the database log From ERD to relational database schema I. Representing entity sets ​- each entity set is represented as a relation with the primary key which is the same as the primary key of the entity set. II. Representing relationships between entity sets one-to-one relationship - Represented by a foreign key inserted to any of the two relations. one-to-many relationship - Represented by a foreign key inserted to the relation on the n side of the relationship. many-to-many relationship - Represented by a separate relation with the primary key composed of primary keys of the related entity sets. SQL interactive ​- Instructions are input interactively; results are immediately presented to the user in a format determined by DBMS (SQL does not have any I/O or formatting instructions). SQL embedded - Instructions are included in the program written in another language; results can be passed to variables of the program for further processing. Aggregate functions​: ​(są też przykładowe w selectach below) COUNT() - number of rows returned by the query AVG() - arithmetic average of values in a numerical column SUM() - sum of values in a numerical column MIN() - minimal value in a character, numerical or date/ datetime column MAX() - maximal value in a character, numerical or date / datetime column other statistical functions (optionally) DBMS-defined functions for: date / datetime values, strings, text fields, large binary objects, etc. Selfjoin ​- ​joining a table with itself. It enables us ​to correlate data that are stored in different rows of the same table. To perform a selfjoin you must use ​table aliases (correlation names). In a ​subquery​, the SELECT statement can be nested in the WHERE or HAVING clause. A nested SELECT statement, called an internal query, provides values for the WHERE or HAVING condition. The SELECT statement containing the subquery is an external query. Unioning Tables​: Summing tables treated as sets of rows (the UNION operator of relational algebra). The tables must be compatible (the same number of columns with matching datatypes). Notes: 1. Duplicate rows are eliminated from the result table unless ALL is specified. 2. If the names of the columns of the unioned tables are different, the result table assumes the names of the columns from the first table. 3. If there is an ORDER BY clause, the numbers must be specified instead of column names Subtracting tables ​treated as sets of rows (the MINUS operator of relational algebra). The tables must be compatible (the same number of columns with matching datatypes). Notes: 1. Duplicate rows are eliminated from the result table unless ALL is specified. 2. If the names of the columns of the subtracted tables are different, the result table assumes the names of the columns from the first table. 3. If there is an ORDER BY clause, the numbers must be specified instead of column names Intersecting tables ​treated as sets of rows (the INTERSECT operator of relational algebra). The tables must be compatible (the same number of columns with matching datatypes). What is the index? It is a table that contains the key values in one column and in the other column disk addresses (record numbers) of rows in the indexed table that contain these values. What is its purpose? 1. It allows you to increase the efficiency of data access from the table (SELECT statement). 2. Organizes the rows of the table in a specific order. 3. Prevents duplicate values in the rows of the table. When to create an index? 1. When the table is often processed sequentially according to the values of a certain column (ascendingly or descendingly). 2. When it is necessary to ensure the uniqueness of the column(s) values. 3. When access to the table is often carried out according to values of a column or columns (associative access). What is the cost of indices? 1. Storage cost (keeping additional data). 2. Time cost (the need to update the index while updating the table by INSERT, UPDATE, DELETE statements. View (virtual table) A combination of rows and columns selected from one or several tables. After creating a view, you can perform SELECT statements on it as on a regular table. In the database (in the data dictionary) only the definition of the view is stored. The view is materialized when it is needed. The benefits of using views​: The same data can be seen differently by different users. They allow you to focus on relevant data and ignore others. They provide an additional level of independence from data: If something changes in the database schema, it is often enough to change the view definition without changing the program. They introduce data protection by hiding data: Users see only views, i.e. only those data to which they have the right of access. Differences between views and tables: Normally, views do not contain data; they contain only information on how to collect data if they are needed (i.e. from which tables, from which columns and how). Views can not be indexed (but you can index tables included in the view definition). In general, you cannot update data through a view. SELECTY podstawowe ZAPYTANIE DZIAŁANIE SELECT​“text” wypisuje text SELECT​2+2 wypisuje 4 SELECT​* ​FROM​table wyświetla całą tabelę SELECT​col1,col2 ​FROM​table wyświetla kolumnę col1 i col2 w określonej kolejności SELECT​​DISTINCT​col ​FROM​table wyświetla kolumnę col bez powtarzania wielokrotnych wystąpień wartości SELECT​col1,col2 ​FROM​table ​WHERE wyświetla podane kolumny z podanej tabeli z wierszy, col3=’cośtam’ ​AND​col4=123 które spełniają warunek (w col3 mają wpisane costam oraz 123 w col4); (analogicznie tak jak AND może być też O ​R warunek może też być nierównością (np. year>=1998) albo nawet O ​ R NOT) SELECT​col1,col2 ​FROM​table ​WHERE -||-; warunek dodatkowy: w col4 musi znajdować się col3=’cośtam’ ​AND​col4 ​IN​​(‘bułka’,’makaron’) bułka lub makaron (to skrócona wersja zapisu “col4=’bułka’ OR col4=’makaron’) (...) ​AND​col4 ​NOT IN​(‘oliwki’,’rodzynki’) w col4 musi być coś innego niż oliwki lub rodzynki SELECT​col1,col2 ​FROM​table (...) gdzie treści w kolumnie col1 zaczynają się literą A WHERE​col1 ​LIKE​‘A%’ (“%” to pozostała część treści i jest nieistotna) (może być też ​NOT LIKE​) (...) ​LIKE​‘A___D’ to znaczy, że szukamy wyrazu, który zaczyna się na A, a na piątym miejscu ma D (między nimi jest odpowiednia ilość powtórzeń podkreślnika) SELECT​* ​FROM​table operator BETWEEN sprawdza, czy podany numer WHERE​number ​BETWEEN​nr1 ​AND​nr2 należy do podanego przedziału SELECT​col1, col2 ​FROM​table ​WHERE wyświetla wspólnie wyniki dla 2 różnych zapytań. col3=cośtam Muszą być koniecznie 2 te same kolumny, UNION​​SELECT​col1, col2 ​FROM​table niekoniecznie tablice i warunki. (warunki mogą być WHERE​col4=cośtam innego nierównościami) UNION ALL zezwala na duplikaty (bo sam UNION zwraca tylko distinct wartości) SELECT​costam ​FROM​table wyświetla costam z tabeli uporządkowane malejąco ORDER BY​year ​DESC​, name względem roku, a następnie normalnie alfabetycznie względem imienia SELECT​col1 ​AS​“costam”, col2 ​AS​kolumna wyświetla kolumnę col1 pod nową nazwą costam, a FROM​table col2 pod nazwą kolumna GROUP BY​col wyświetla dane pogrupowane względem zawartości kolumny col SELECT​* ​FROM​costam1 łączy dwie tabele (costam1 i costam2) na podstawie ID, JOIN​costam2 ​ON​costam1.id = costam2.id które jest wspólne w obu tabelach alternatywnie:​JOIN​costam2 ​USING​(id) WHERE​number = (SELECT MIN(number) SELECT MIN - szuka najmniejszego numeru FROM table) SELECT​AVG(col1) ​FROM​table AVG - wyświetla średnią wartości z podanej kolumny SELECT ​SUM(col1) ​FROM​table SUM - zwraca sumę wartości z podanej kolumny SELECT​​COUNT​(col1) count(*) zwraca ilość wierszy w całej tabeli, lub count(*) count([nazwa_kolumny]) zwraca ilość nie-NULL wierszy w danej kolumnie SELECT ​COUNT​(DISTINCT col1) zlicza ilość wierszy w kolumnie col1 bez powtórzeń SELECT​​*​​FROM​table tu ogólnie chodzi o to, że on sobie sortuje wszystko w ORDER​​BY taki sposób, że najpierw rok, potem nazwa, ale jeśli w CASE ​ danym wierszu w kolumnie “col” znajduje się x lub y, to ​WHEN​col ​IN​​(​'x'​,​’y'​)​​THEN​​1 wtedy te wiersze są wyświetlane na samym końcu (bo ​ELSE​​0 do tych wierszy przypisaliśmy cyferkę 1, a że 1>0 oraz END​,​year​, ​name chcemy sortować wszystko rosnąco, to tak to jest) HAVING ​ MAX​ (number) < 2000 “pod warunkiem, że największy z tabeli number jest mniejszy od 2000” - HAVING to takie WHERE, ale użyte z aggregate functions (bo WHERE nie można z nimi używać) albo po GROUP BY LIKE​“J$?%%” ​ESCAPE​“?” ​ denotes the string that begins with the letter “J” followed by one dollar sign followed by one percent sign followed by any string of 0 or more characters.

Use Quizgecko on...
Browser
Browser