ilovepdf_merged.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Lecture 1 Introduction to DBMS. Data Modeling. Senior Lecturer: Tulebayev Yersultan Introduction to DBMS. Data Modeling: PART I. What is a database? Properties, Characteristics, Database Management System, Application, Example. PART II. Types of database models. Advantages and...
Lecture 1 Introduction to DBMS. Data Modeling. Senior Lecturer: Tulebayev Yersultan Introduction to DBMS. Data Modeling: PART I. What is a database? Properties, Characteristics, Database Management System, Application, Example. PART II. Types of database models. Advantages and disadvantages. PART III. ER Diagram. Entity, Attributes, Relationships. PART I. What is a database? Properties, Characteristics, Database Management System, Application, Example. What is a database? A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. Properties of a database: A database represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD). Changes to the miniworld are reflected in the database. A database is a logically coherent collection of data with some inherent meaning. A random assortment of data cannot correctly be referred to as a database. A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested. Potential problems: Size of data Ease of updating Accuracy of data Security of data Redundancy of data Importance of data What is a database management system? A database management system (DBMS) is system software for creating and managing databases. A DBMS makes it possible for end users to create, protect, read, update and delete data in a database. Example of a DBMS - The STUDENT file stores data of each student - The COURSE file stores contain data on each course - The SECTION stores the information about sections in a particular course - The GRADE_REPORT file stores the grades which students receive in the various sections - The PREREQUISITE file contains information about pre- courses Characteristics of Database Management System: - Provides security and removes redundancy - Self-describing nature of a database system - Insulation between programs and data abstraction - Support of multiple views of the data - Sharing of data and multiuser transaction processing - DBMS allows entities and relations among them to form tables. - It follows the ACID concept ( Atomicity, Consistency, Isolation, and Durability). - DBMS supports multi-user environment that allows users to access and manipulate data in parallel. Three-Schema Architecture: Users in a DBMS environment Application of DBMS: PART II. Types of database models. Advantages and disadvantages. Types of database models: 1. Hierarchical Database 2. Network Database 3. Relational Database 4. Object-Oriented Model Type 1. Hierarchical Database A hierarchical database is a data model in which data is stored in the form of records and organized into a tree-like structure, or parent-child structure, in which one parent node can have many child nodes connected through links. Example of hierarchical database Advantages of the Hierarchical DBMS : Data can be retrieved easily due to the explicit links present between the table structures. Referential integrity is always maintained i.e. any changes made in the parent table are automatically updated in a child table. Promotes data sharing. It is conceptually simple due to the parent-child relationship. Database security is enforced. Efficient with 1: N relationships. A clear chain of command or authority. Increases specialization. High performance. Clear results. Disadvantages of the Hierarchical DBMS : If the parent table and child table are unrelated then adding a new entry in the child table is difficult because additional entry must be added in the parent table. Complex relationships are not supported. Redundancy which results in inaccurate information. Change in structure leads to change in all application programs. M: N relationship is not supported. No data manipulation or data definition language. Lack of standards. Poor flexibility Organizational Disunity and Rigid structure. Type 2. Network database The network database model was a progression from the hierarchical database model and was designed to solve some of that model's problems, specifically the lack of flexibility. Instead of only allowing each child to have one parent, this model allows each child to have multiple parents Example of Network DBMS: Advantages of the Network DBMS: fast data access. It also allows users to create queries that are more complex than those they created using a hierarchical database. So, a variety of queries can be run over this model. Disadvantages of the Network DBMS: A user must be very familiar with the structure of the database to work through the set structures. Updating inside this database is a tedious task. One cannot change a set structure without affecting the application programs that use this structure to navigate through the data. If you change a set structure, you must also modify all references made from within the application program to that structure. Type 3. Relational Database A relational model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples. Columns are also called attributes. Example of the Relational DBMS: Advantages of using RDBMS: Ease of Use Network Access Language Performance Prevents Data Redundancy Privileges and Data Security Disadvantages of RDBMS: Cost Lack of Speed Memory Space POPULAR RELATIONAL DATABASE MANAGEMENT SYSTEMS: Oracle SQL Server MySQL PostgreSQL SQLite Type 4. Object-Oriented Model An object-oriented database (OOD) is a database system that can work with complex data objects — that is, objects that mirror those used in object-oriented programming languages. In object- oriented programming (OOP), everything is an object. Elements of Object-Oriented data model: Object Attributes and Method Class Inheritance Example of Object-Oriented data model: Advantages of Object-Oriented data model: Reusability: generic objects can be defined and then reused in numerous application. Complex data types: Can manage complex data such as document, graphics, images, voice messages, etc. Distributed databases: Due to mode of communication between objects, OODBMS can support distribution of data across networks more easily. Disadvantages of Object Oriented Databases: - Competition - Complexity - Lack of support for views - Lack of support for security - Lack of standards - Lack of experience - Lack of universal data model - Query Optimization compromises encapsulation PART III. ER Diagram. Entity, Attributes, Relationships. ENTITY RELATIONAL (ER) MODEL is a high-level conceptual data model diagram. ER modeling helps you to analyze data requirements systematically to produce a well-designed database. The Entity-Relation model represents real-world entities and the relationship between them. ER DIAGRAMS ENTITY-RELATIONSHIP DIAGRAM (ERD) displays the relationships of entity set stored in a database. In other words, we can say that ER diagrams help you to explain the logical structure of databases. Why use ER Diagrams? Helps you to define terms related to entity relationship modeling Provide a preview of how all your tables should connect, what fields are Helps to describe entities, attributes, relationships ER diagrams are translatable into relational tables which allows you to build databases quickly ER diagrams can be used by database designers as a blueprint for implementing data in specific software applications The database designer gains a better understanding of the information to be contained in the database with the help of ERP diagram ERD is allowed you to communicate with the logical structure of the database to users Components of the ER Diagram Entities Attributes Relationships WHAT IS ENTITY? A real-world thing either living or non-living that is easily recognizable and nonrecognizable. An entity can be place, person, object, event or a concept, which stores data in the database. Examples of entities: Person: Employee, Student, Patient Place: Store, Building Object: Machine, product, and Car Event: Sale, Registration, Renewal Concept: Account, Course ATTRIBUTES Entities are represented by their properties, which also called attributes. For example, a student entity may have a name, age, class, as attributes. Relationships There are three types of relationships between entities (tables) in data modeling: One-to-many relationships (also denoted as 1:M). Many-to-many relationships (M:N). One-to-one relationships (1:1). Let's study them with an example: Lecture 2 Attributes in DBMS. Normalization. Types of Normal Forms. Senior Lecturer: Tulebayev Yersultan Attributes in DBMS. Normalization. Types of Normal Forms: PART I. Attributes in DBMS. Types of the attributes. Super key, Primary key, Foreign key. PART II. Functional Dependency. Types of the functional dependencies. PART III. Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database. PART I. Attributes in DBMS. Types of the attributes. Super key, Primary key, Foreign key. Attributes in DBMS Super key Candidate key Primary key Foreign key Non-prime attribute Super key A super key is a set of one or more attributes (columns), which can uniquely identify a row in a table. Example: Table Employee Super keys: The above table has following super keys. All of the following sets of super key are able to uniquely identify a row of the employee table: - {Emp_SSN} - {Emp_Number} - {Emp_SSN, Emp_Number} - {Emp_SSN, Emp_Name} - {Emp_SSN, Emp_Number, Emp_Name} - {Emp_Number, Emp_Name} Candidate key A candidate key is a set of one or more attributes (columns), which can uniquely identify a row in a table with no redundant attribute. Example: Table Employee Candidate Keys: As mentioned in the beginning, a candidate key is a minimal super key with no redundant attributes. The following two set of super keys are chosen from the above sets as there are no redundant attributes in these sets. - {Emp_SSN} - {Emp_Number} Primary key A primary key is a column or a group of columns used to identify a row uniquely in a table. Rules For Defining the Primary Key Minimal Accessible NON NULL Value Time Invariant Unique Example: Table Employee A Primary key is selected from a set of candidate keys. This is done by database admin or database designer. We can say that either {Emp_SSN} or {Emp_Number} can be chosen as a primary key for the table Employee. Foreign key A foreign key is a column or a group of columns in a table that reference the primary key of another table. Example: The table that contains the foreign key is called the referencing table or child table. And the table referenced by the foreign key is called the referenced table or parent table. PART II. Functional Dependency. Types of the functional dependencies. Functional dependency in DBMS Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute in a Database Management System (DBMS). A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y is represented by X → Y. Example of the functional dependency In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary, etc. By this, we can say that the city, Employee Name, and salary are functionally depended on Employee number: Employee Number → Employee Name Employee Number → Salary Employee Number → City Types of Functional Dependencies in DBMS Multivalued Dependency Trivial Functional Dependency Non-Trivial Functional Dependency Transitive Dependency Multivalued Dependency Multivalued Dependency (MVD) is a form of data dependency where two or more attributes, other than the key attribute, are functionally dependent on each other, but not on the key itself. Example This dependence can be represented like this: car_model -> maf_year car_model -> colour Trivial Functional Dependency The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that attribute. So, X -> Y is a trivial functional dependency if Y is a subset of X. Example {Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}. Non Trivial Functional Dependency In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency. Example {Company} -> {CEO} (if we know the Company, we knows the CEO name) But CEO is not a subset of Company, and hence it’s non-trivial functional dependency. Transitive Dependency A Transitive dependency in a database is an indirect relationship between values in the same table that causes a functional dependency. Example {Company} -> {CEO} (if we know the compay, we know its CEO’s name) {CEO } -> {Age} If we know the CEO, we know the Age Therefore according to the rule of rule of transitive dependency: { Company} -> {Age} should hold, that makes sense because if we know the company name, we can know his age. Advantages of Functional Dependency Functional Dependency avoids data redundancy. Therefore same data do not repeat at multiple locations in that database It helps you to maintain the quality of data in the database It helps you to defined meanings and constraints of databases It helps you to identify bad designs It helps you to find the facts regarding the database design PART II. Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database. Normalization in DBMS Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomalies. Anomalies in DBMS There are three types of anomalies that occur when the database is not normalized. These are: - insertion, - update - and deletion anomaly. Example This table is not normalized. We will see the problems that we face when a table in database is not normalized. Normalization: Normal forms. First normal form(1NF) Second normal form(2NF) Third normal form(3NF) Boyce & Codd normal form (BCNF) First normal form (1NF) A relation is said to be in 1NF (first normal form), if it doesn’t contain any multi-valued attribute. In other words you can say that a relation is in 1NF if each attribute contains only atomic(single) value only. Example Let’s say a company wants to store the names and contact details of its employees. It creates a table in the database that looks like this: Solution Second normal form (2NF) A table is said to be in 2NF if both the following conditions hold: - Table is in 1NF (First normal form) - No non-prime attribute is dependent on the proper subset of any candidate key of table. Example Let’s say a school wants to store the data of teachers and the subjects they teach. They create a table Teacher that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same teacher. Solution Teacher_Details table: Teacher_Subject table: Third normal form (3NF) A table design is said to be in 3NF if both the following conditions hold: - Table must be in 2NF - Transitive functional dependency of non-prime attribute on any super key should be removed. Example Let’s say a company wants to store the complete address of each employee, they create a table named Employee_Details that looks like this: Solution Employee Table: Employee_Zip table: Boyce Codd normal form (BCNF) It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table. Example Suppose there is a company wherein employees work in more than one department. They store the data like this: Solution Emp_Nationality Table: Emp_Dept_Mapping table Emp_Dept table: Lecture 3 SQL. Types of SQL. Data Definition Language. Managing Tables. Constraints. Senior Lecturer: Tulebayev Yersultan SQL. Types of SQL. Data Definition Language. Managing Tables. Constraints: PART I. Structured Query Language. History, Advantages, Types of SQL. PART II. Data Definition Language. Managing Tables. Constraints. PART I. Structured Query Language. History, Advantages, Types of SQL. What is SQL? Structured query language (SQL) is a standard language for database creation and manipulation. SQL is the standard language for database management. All the RDBMS systems like MySQL, MS Access, Oracle, Sybase, Postgres, and SQL Server use SQL as their standard database language. SQL programming language uses various commands for different operations. History: 1970 - Dr. E. F. "Ted" of IBM is known as the father of relational databases. He described a relational model for databases. 1974 - Structured Query Language appeared. 1978 - IBM worked to develop Codd's ideas and released a product named System/R. 1986 - IBM developed the first prototype of relational database and standardized by ANSI. The first relational database was released by Relational Software and its later becoming Oracle. What are the SQL? SQL follows the following rules: Structure query language is not case sensitive. Generally, keywords of SQL are written in uppercase. Statements of SQL are dependent on text lines. We can use a single SQL statement on one or multiple text line. Using the SQL statements, you can perform most of the actions in a database. SQL depends on tuple relational calculus and relational algebra. What is SQL Process? When an SQL command is executing for any RDBMS, then the system figure out the best way to carry out the request and the SQL engine determines that how to interpret the task. In the process, various components are included. These components can be optimization Engine, Query engine, Query dispatcher, classic, etc. All the non-SQL queries are handled by the classic query engine, but SQL query engine won't handle logical files. What is SQL Process? Why Use SQL? It helps users to access data in the RDBMS system. It helps you to describe the data. It allows you to define the data in a database and manipulate that specific data. With the help of SQL commands in DBMS, you can create and drop databases and tables. SQL offers you to use the function in a database, create a view, and stored procedure. You can set permissions on tables, procedures, and views. What are the Advantages of SQL? High speed No coding needed Well defined standards Portability Interactive language Multiple data view Types of SQL Here are five types of widely used SQL queries: Data Definition Language (DDL) Data Manipulation Language (DML) Data Control Language(DCL) Transaction Control Language(TCL) Data Query Language (DQL) PART II. Data Definition Language. Managing Tables. Constraints. Data Definition Language Data definition language (DDL) refers to the set of SQL commands that can create and manipulate the structures of a database. Using DDL statements, you can perform powerful commands in your database such as creating, modifying, and dropping objects. DDL commands are usually executed in a SQL browser or stored procedure. Types of DDL commands: CREATE DROP ALTER TRUNCATE The CREATE Statement To create a new table in PostgreSQL, you use the CREATE TABLE statement. The following illustrates the syntax of the CREATE TABLE statement: PostgreSQL column constraints NOT NULL – the value of the column cannot be NULL. UNIQUE – the value of the column must be unique across the whole table. However, the column can have many NULL values because PostgreSQL treats each NULL value to be unique. Notice that SQL standard only allows one NULL value in the column that has the UNIQUE constraint. PRIMARY KEY – this constraint is the combination of NOT NULL and UNIQUE constraints. You can define one column as PRIMARY KEY by using column-level constraint. In case the primary key contains multiple columns, you must use the table-level constraint. REFERENCES – constrains the value of the column that exists in a column in another table. You use REFERENCES to define the foreign key constraint. PostgreSQL table constraints UNIQUE (column_list) – to force the value stored in the columns listed inside the parentheses to be unique. PRIMARY KEY (column_list) – to define the primary key that consists of multiple columns. REFERENCES - to constrain the value stored in the column that must exist in a column in another table. PostgreSQL CREATE TABLE example CREATE TABLE account( user_id serial PRIMARY KEY, username VARCHAR (50) UNIQUE NOT NULL, password VARCHAR (50) NOT NULL, email VARCHAR (355) UNIQUE NOT NULL, created_on TIMESTAMP NOT NULL, last_login TIMESTAMP ); PostgreSQL CREATE TABLE example CREATE TABLE account_role ( CREATE TABLE role( user_id integer NOT NULL, role_id serial PRIMARY KEY, role_id integer NOT NULL, role_name VARCHAR (255) UNIQUE NOT grant_date timestamp, NULL FOREIGN KEY (role_id) REFERENCES role (role_id) ); ); PostgreSQL CREATE TABLE example If you have forgotten to add Primary and Foreign keys, you can modify your table using ALTER TABLE command: ALTER TABLE account_role ADD CONSTRAINT constraint_fkey FOREIGN KEY (user_id) REFERENCES account (user_id); The ALTER Statement To change the existing table structure, you use PostgreSQL statement. The syntax of the ALTER TABLE is as follows: Example ALTER TABLE account DROP COLUMN email; PostgreSQL provides many actions that allow you to: 1)Add a column, drop a column, rename a column, or change a column’s datatype; 2) Set a default value for the column; 3) Rename a table. The following illustrates the ALTER TABLE statement variants: The following illustrates the ALTER TABLE statement variants: The DROP Statement The DROP statement—to round out our trio of basic DDL statements—drops, removes, deletes, obliterates, cancels, blows away, and/or destroys the object it is dropping. After the DROP statement has been run, the object is gone. The syntax is as simple as it can be: DROP TABLE teams; PostgreSQL DROP TABLE syntax: DROP TABLE [IF EXISTS] table_name [CASCADE | RESTRICT]; The TRUNCATE Statement To remove all data from a table, you use the DELETE statement. However, when you use the DELETE statement to delete all data from a table that has a lot of data, it is not efficient. In this case, you need to use the TRUNCATE TABLE statement: TRUNCATE TABLE table_name; Remove all data from multiple tables To remove all data from multiple tables at once, you separate each table by a comma (,) as follows: TRUNCATE TABLE table_name1, table_name2,...; Lecture 4 Data Types in PostgreSQL. DML. DQL. Functions. Filtering Data. Conditional Expressions & Operators. Senior Lecturer: Tulebayev Yersultan Data Types in PostgreSQL. DML. DQL. Functions. Filtering Data. Conditional Expressions & Operators. PART I. Data Types in PostgreSQL. PART II. Data Manipulation Language. DQL. PART III. String Functions. Filtering Data. Conditional Expressions & Operators. PART I. Data Types in PostgreSQL. PostgreSQL Data Types Boolean Character types such as char, varchar, and text. Numeric types such as integer and floating-point number. Temporal types such as date, time, timestamp, and interval UUID for storing Universally Unique Identifiers Array for storing array strings, numbers, etc. JSON stores JSON data hstore stores key-value pair Special types such as network address and geometric data. Boolean A Boolean data type can hold one of three possible values: true, false and NULL. You use boolean or bool keyword to declare a column with the Boolean data type. Character The following table illustrate the character types in PostgreSQL: Numeric PostgreSQL provides two distinct types of numbers: - integers - floating-point numbers Integer The following table illustrates the specification of each integer type: Floating-point number There three main types of floating-point numbers: - float(n) is a floating-point number whose precision, at least, n, up to a maximum of 8 bytes. - Real or float8 is a 4-byte floating-point number. - numeric or numeric(p,s) is a real number with p digits with s number after the decimal point. The numeric(p,s) is the exact number. Example of Numeric(p,s) DROP TABLE IF EXISTS products; CREATE TABLE products ( id SERIAL KEY, name VARCHAR(100) NOT NULL, price NUMERIC(5,2) ); INSERT INTO products (name, price) VALUES ('Phone',500.215), ('Tablet',500.214); Temporal data types DATE stores the dates only. TIME stores the time of day values. TIMESTAMP stores both date and time values. TIMESTAMPTZ is a time zone-aware timestamp data type. It is the abbreviation for timestamp with the time zone. INTERVAL stores periods of time. The following table illustrates the ISO 8601 interval unit abbreviations: PART II. Data Manipulation Language. DQL. Data Manipulation Language A data manipulation language (DML) is a family of computer languages including commands permitting users to manipulate data in a database. This manipulation involves inserting data into database tables, retrieving existing data, deleting data from existing tables and modifying existing data. DML is mostly incorporated in SQL databases. Commands for DML: UPDATE: This command modifies data of one or more records. INSERT: This command adds one or more records to a database table. DELETE: This command removes one or more records from a table according to specified conditions. PostgreSQL INSERT The PostgreSQL statement allows you to insert a new row into a table. The following illustrates the most basic syntax of the INSERT statement: INSERT INTO table_name(column1, column2,...) VALUES (value1, value2, …); PostgreSQL INSERT Multiple Rows To insert multiple rows into a table using a single INSERT statement, you use the following syntax: INSERT INTO table_name (column_list) VALUES (value_list_1), (value_list_2),... (value_list_n); PostgreSQL INSERT statement examples DROP TABLE IF EXISTS links; CREATE TABLE links ( Inserting a single row into a table id SERIAL PRIMARY KEY, INSERT INTO links (url, name) url VARCHAR(255) NOT NULL, VALUES('https://www.postgresqltutorial.com','PostgreSQL Tutorial'); name VARCHAR(255) NOT NULL, description VARCHAR (255), last_update DATE ); Inserting multiple rows and returning inserted rows INSERT INTO links (url, name) VALUES ('https://www.google.com','Google'), ('https://www.yahoo.com','Yahoo'), ('https://www.bing.com','Bing'); PostgreSQL UPDATE The PostgreSQL UPDATE statement allows you to modify data in a table. The following illustrates the syntax of the UPDATE statement: UPDATE table_name SET column1 = value1, column2 = value2, … WHERE condition; PostgreSQL UPDATE examples DROP TABLE IF EXISTS courses; CREATE TABLE courses ( course_id serial primary key, course_name VARCHAR(255) NOT NULL, description VARCHAR(500), published_date date ); INSERT INTO courses(course_name, description, published_date) VALUES ('PostgreSQL for Developers','A complete PostgreSQL for Developers','2020-07-13'), ('PostgreSQL Admininstration','A PostgreSQL Guide for DBA',NULL), 'PostgreSQL High Performance',NULL,NULL), ('PostgreSQL Bootcamp','Learn PostgreSQL via Bootcamp','2013-07-11'), ('Mastering PostgreSQL','Mastering PostgreSQL in 21 Days','2012-06-30'); PostgreSQL UPDATE – updating one row UPDATE courses SET published_date = ’2020-08-01’ WHERE course_id = 3; PostgreSQL DELETE The PostgreSQL DELETE statement allows you to delete one or more rows from a table. The following shows basic syntax of the DELETE statement: DELETE FROM table_name WHERE condition; PostgreSQL DELETE statement examples DROP TABLE IF EXISTS links; CREATE TABLE links ( id serial PRIMARY KEY, url varchar(255) NOT NULL, name varchar(255) NOT NULL, description varchar(255), last_update date DEFAULT now() ); PostgreSQL DELETE statement examples INSERT INTO links VALUES ('1', 'https://www.postgresqltutorial.com', 'PostgreSQL Tutorial', 'Learn PostgreSQL fast and easy' , '2013-06-02'), ('2', 'http://www.oreilly.com', 'O''Reilly Media', 'O''Reilly Media', '2013-06-02'), ('3', 'http://www.google.com', 'Google', 'Google' , '2013-06-02'), ('4', 'http://www.yahoo.com', 'Yahoo', 'Yahoo' , '2013-06-02'), ('5', 'http://www.bing.com', 'Bing', 'Bing' , '2013-06-02'), ('6', 'http://www.facebook.com', 'Facebook', 'Facebook' , '2013-06-01'), ('7', 'https://www.tumblr.com/', 'Tumblr', 'Tumblr' , '2013-06-02'), ('8', 'http://www.postgresql.org', 'PostgreSQL', 'PostgreSQL', '2013-06-02'); 1) Using PostgreSQL DELETE to delete one row from the table: DELETE FROM links WHERE id = 8; 2) Using PostgreSQL DELETE to delete a row and return the deleted row: DELETE FROM links WHERE id = 7 RETURNING *; 3) Using PostgreSQL DELETE to delete all rows from the table: DELETE FROM links; PostgreSQL DQL (SELECT statement) One of the most common tasks, when you work with the database, is to query data from tables by using the SELECT statement. The SELECT statement is one of the most complex statements in PostgreSQL. It has many clauses that you can use to form a flexible query. Let’s start with the basic form of the SELECT statement that retrieves data from a single table. The following illustrates the syntax of the SELECT statement: SELECT select_list FROM table_name; PostgreSQL SELECT examples 1) Using PostgreSQL SELECT statement to query data from one column example: SELECT first_name FROM customer; 2) Using PostgreSQL SELECT statement to query data from multiple columns example: SELECT first_name, last_name, email FROM customer; 3) Using PostgreSQL SELECT statement to query data from all columns of a table example: SELECT * FROM customer; PostgreSQL SELECT examples 4) Using PostgreSQL SELECT statement with expressions example: SELECT first_name || ‘ ’ || last_name, email FROM customer; 5) Using PostgreSQL SELECT statement with expressions example: SELECT 5 * 3; SELECT now(); PostgreSQL Column Alias A column alias allows you to assign a column or an expression in the select list of a SELECT statement a temporary name. The column alias exists temporarily during the execution of the query. The following illustrates the syntax of using a column alias: SELECT column_name AS alias_name FROM table_name; PostgreSQL column alias examples 1) Assigning a column alias to a column example: SELECT first_name, last_name AS surname FROM customer; 2) Assigning a column alias to an expression example: SELECT first_name || ' ' || last_name AS full_name FROM customer; PostgreSQL Table Aliases Table aliases temporarily assign tables new names during the execution of a query. The following illustrates the syntax of a table alias: table_name AS alias_name; PostgreSQL ORDER BY The ORDER BY clause allows you to sort rows returned by a SELECT clause in ascending or descending order based on a sort expression. The following illustrates the syntax of the ORDER BY clause: SELECT select_list FROM table_name ORDER BY sort_expression1 [ASC | DESC], … sort_expressionN [ASC | DESC]; PostgreSQL ORDER BY examples 1) Using PostgreSQL ORDER BY clause to sort rows by one column: SELECT first_name, last_name FROM customer ORDER BY first_name ASC; 2) Using PostgreSQL ORDER BY clause to sort rows by multiple columns: SELECT first_name, last_name FROM customer ORDER BY first_name ASC, last_name DESC; PostgreSQL ORDER BY examples The LENGTH() function accepts a string and returns the length of that string. The following statement selects the first names and their lengths. It sorts the rows by the lengths of the first names: SELECT first_name, LENGTH(last_name) as len FROM customer ORDER BY len DESC; PostgreSQL SELECT DISTINCT The DISTINCT clause is used in the SELECT statement to remove duplicate rows from a result set. The following illustrates the syntax of the clause: SELECT DISTINCT column1 FROM table_name; PostgreSQL SELECT DISTINCT examples CREATE TABLE distinct_demo ( id serial NOT NULL PRIMARY KEY, bcolor VARCHAR, fcolor VARCHAR ); INSERT INTO distinct_demo (bcolor, fcolor) VALUES 'red', 'red'), 'red', 'red'), 'red', NULL), NULL, 'red'), 'red', 'green'), 'red', 'blue'), 'green', 'red'), 'green', 'blue'), 'green', 'green'), 'blue', 'red'), 'blue', 'green'), 'blue', 'blue'); PostgreSQL SELECT DISTINCT examples SELECT DISTINCT bcolor FROM distinct_demo ORDER BY bcolor; PART III. String Functions. Filtering Data. Conditional Expressions & Operators. PostgreSQL String Functions Introduction to PostgreSQL CONCAT function PostgreSQL introduced a built-in string function named CONCAT to concatenate two or more strings into one. The following illustrates the syntax of the CONCAT function: CONCAT(str_1, str_2,...) PostgreSQL CONCAT function examples: 1) SELECT CONCAT ('CONCAT',' ', 'function’); 2) SELECT CONCAT (first_name, ' ', last_name) AS "Full name” FROM customer 3) SELECT first_name, CONCAT (‘Your first name ‘, length(first_name), ‘ characters’) FROM customer PostgreSQL FORMAT Function PostgreSQL FORMAT() function formats arguments based on a format string. The syntax of the PostgreSQL FORMAT() function is as follows: FORMAT(format_string [, format_arg [,...] ]) Examples 1) SELECT FORMAT('Hello, %s','PostgreSQL’); 2) SELECT FORMAT('%s, %s’, last_name, first_name) full_name FROM customer ORDER BY full_name; PostgreSQL Letter Case Functions PostgreSQL LOWER function PostgreSQL UPPER function PostgreSQL INITCAP function Examples 1) LOWER() FUNCTION SELECT LOWER(last_name) FROM customer ORDER BY last_name; 2) UPPER() FUNCTION SELECT UPPER(last_name) FROM customer ORDER BY last_name; 3)INITCAP() FUNCTION SELECT INITCAP(CONCAT (first_name, ‘ ’, last_name)) FROM customer ORDER BY first_name; PostgreSQL LEFT Function The PostgreSQL LEFT() function returns the first n characters in the string. The following illustrates the syntax of the PostgreSQL LEFT() function: LEFT(string, n) Examples 1)SELECT LEFT('ABC',1); 2)SELECT LEFT('ABC',2); 3)SELECT LEFT('ABC',-2); PostgreSQL RIGHT Function The PostgreSQL RIGHT() function returns the last n characters in a string. The following shows the syntax of the PostgreSQL RIGHT() function: RIGHT(string, n) Examples 1)SELECT RIGHT('XYZ', 2); 2)SELECT RIGHT('XYZ', - 1); 3) SELECT last_name FROM customer WHERE RIGHT(last_name,3) = ‘son’; PostgreSQL TRIM Function The LTRIM() function removes all characters, spaces by default, from the beginning of a string. The RTRIM() function removes all characters, spaces by default, from the end of a string. The BTRIM() function is the combination of the LTRIM() and RTRIM() functions. Examples 1)SELECT LTRIM('enterprise', 'e’); 2)SELECT RTRIM('enterprise', 'e’); 3)SELECT BTRIM('enterprise', 'e'); PostgreSQL POSITION Function The PostgreSQL POSITION() function returns the location of a substring in a string. The following illustrates the syntax of the PostgreSQL POSITION() function: POSITION(substring in string) Example 1) SELECT POSITION('Tutorial' IN 'PostgreSQL Tutorial’); 2) SELECT POSITION('tutorial' IN 'PostgreSQL Tutorial'); PostgreSQL REPLACE Function To search and replace all occurrences of a string with a new one, you use the REPLACE() function. The following illustrates the syntax of the PostgreSQL REPLACE() function: REPLACE(source, old_text, new_text ); Examples 1) SELECT REPLACE ('ABC AA', 'A', 'Z’); 2) UPDATE customer SET email = REPLACE ( email, ‘qwerty1.org’, ‘postgressqltutorial.com’ ); PostgreSQL Substring Function The substring function returns a part of string. The following illustrates the syntax of the substring function: SUBSTRING ( string ,start_position , length ) Examples 1)SELECT SUBSTRING ('PostgreSQL', 1, 8); -- PostgreS 2)SELECT SUBSTRING ('PostgreSQL', 8); -- SQL 3) SELECT SUBSTRING ('PostgreSQL' FROM 1 FOR 8); -- PostgreS PostgreSQL Filtering Data The SELECT statement returns all rows from one or more columns in a table. To select rows that satisfy a specified condition, you use a WHERE clause. The syntax of the PostgreSQL WHERE clause is as follows: SELECT select_list FROM table_name WHERE condition ORDER BY sort_expression; Comparison and logical operators PostgreSQL WHERE clause examples PostgreSQL WHERE clause examples PostgreSQL WHERE clause examples Lecture 5 PostgreSQL JOINS. Inner Join. Full Join. Left Join. Right Join. Cross Join. Natural Join. Self Join. Senior Lecturer: Tulebayev Yersultan PostgreSQL JOINS. PART I. PostgreSQL JOINS. Inner Join. Full Join. Left Join. Right Join. PART II. Cross Join. Natural Join. Self Join. PART I. PostgreSQL JOINS. Inner Join. Full Join. Left Join. Right Join. PostgreSQL Joins The SQL Join clause is used to combine data from two or more tables in a database. When the related data is stored across multiple tables, joins help you to retrieve records combining the fields from these tables using their foreign keys. Following is the basic syntax of a the SQL JOIN CLAUSE: SELECT column_name(s) FROM table1 JOIN table2; Example CREATE TABLE CUSTOMERS ( INSERT INTO CUSTOMERS VALUES ID INT NOT NULL, (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 ), (2, 'Khilan', 25, 'Delhi', 1500.00 ), NAME VARCHAR (20) NOT NULL, (3, 'Kaushik', 23, 'Kota', 2000.00 ), AGE INT NOT NULL, (4, 'Chaitali', 25, 'Mumbai', 6500.00 ), ADDRESS CHAR (25), (5, 'Hardik', 27, 'Bhopal', 8500.00 ), (6, 'Komal', 22, 'Hyderabad', 4500.00 ), SALARY DECIMAL (18, 2), (7, 'Muffy', 24, 'Indore', 10000.00 ); PRIMARY KEY (ID) ); Example CREATE TABLE ORDERS ( INSERT INTO ORDERS VALUES OID INT NOT NULL, (102, '2009-10-08 00:00:00', 3, 3000.00), (100, '2009-10-08 00:00:00', 3, 1500.00), DATE VARCHAR (20) NOT NULL, (101, '2009-11-20 00:00:00', 2, 1560.00), CUSTOMER_ID INT NOT NULL, (103, '2008-05-20 00:00:00', 4, 2060.00); AMOUNT DECIMAL (18, 2) ); Example SELECT ID, NAME, AGE, AMOUNT FROM CUSTOMERS JOIN ORDERS ON CUSTOMER.ID = ORDERS.CUSTOMER_ID; The SQL Inner Join The SQL Inner Join is a type of join that combines multiple tables by retrieving records that have matching values in both tables (in the common column). Following is the basic syntax of SQL Inner Join: SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name; Explanation of Inner Join The following Venn diagram illustrates the inner join: Example SELECT ID, NAME, AGE, AMOUNT FROM CUSTOMERS INNER JOIN ORDERS ON CUSTOMER.ID = ORDERS.CUSTOMER_ID; Joining Multiple Tables Using Inner Join Until now, we have only learnt how to join two tables using Inner Join. However, we can also join as many tables as possible, using Inner Join, by specifying the condition (with which these tables are to be joined). Following is the syntax to join more than two tables using Inner Join: SELECT column1, column2, column3... FROM table1 INNER JOIN table2 ON condition_1 INNER JOIN table3 ON condition_2.... INNER JOIN tableN ON condition_N; Example CREATE TABLE EMPLOYEE ( EID INT NOT NULL, EMPLOYEE_NAME VARCHAR (30) NOT NULL, SALES_MADE DECIMAL (20) ); INSERT INTO EMPLOYEE VALUES (102, 'SARIKA', 4500), (100, 'ALEKHYA', 3623), (101, 'REVATHI', 1291), (103, 'VIVEK', 3426); EXAMPLE SELECT OID, DATE, AMOUNT, EMPLOYEE_NAME FROM CUSTOMERS INNER JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID INNER JOIN EMPLOYEE ON ORDERS.OID = EMPLOYEE.EID; Inner Join with WHERE Clause Inner Join uses WHERE clause to apply more constraints on the data to be retrieved. The syntax of Inner Join when used with WHERE clause is given below: SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name WHERE condition; Example SELECT ID, NAME, DATE, AMOUNT FROM CUSTOMERS INNER JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID WHERE ORDERS.AMOUNT > 2000.00; The SQL OUTER Join An Outer Join retrieves all the records in two tables even if there is no counterpart row of one table in another table, unlike Inner Join. Following are the different types of outer Joins: - LEFT JOIN − returns all rows from the left table, even if there are no matches in the right table. - RIGHT JOIN − returns all rows from the right table, even if there are no matches in the left table. - FULL JOIN − returns rows when there is a match in one of the tables. The SQL Left Join Left Join or Left Outer Join in SQL combines two or more tables, where the first table is returned wholly; but, only the matching record(s) are retrieved from the consequent tables. Following is the basic syntax of Left Join in SQL: SELECT column_name(s) FROM table1 LEFT JOIN table2 ON table1.column_name = table2.column_name; The following Venn diagram illustrates the left join: Example SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID; The SQL Right Join The Right Join or Right Outer Join query in SQL returns all rows from the right table, even if there are no matches in the left table. Following is the basic syntax of Right Join in SQL: SELECT table1.column1, table2.column2... FROM table1 RIGHT JOIN table2 ON table1.common_field = table2.common_field; The following Venn diagram illustrates the right join: Example SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS RIGHT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID; The SQL Full Join SQL Full Join creates a new table by joining two tables as a whole. The joined table contains all records from both the tables and fills NULL values for missing matches on either side. Following is the basic syntax of Full Join in SQL: SELECT column_name(s) FROM table1 FULL JOIN table2 ON table1.column_name = table2.column_name; The following Venn diagram illustrates the full outer join: Example SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS FULL JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID; Outer join explanation PART II. Cross Join. Natural Join. Self Join. SQL NATURAL JOIN A natural join is a join that creates an implicit join based on the same column names in the joined tables. The following shows the syntax of the PostgreSQL natural join: SELECT select_list FROM T1 NATURAL [INNER, LEFT, RIGHT] JOIN T2; SQL NATURAL JOIN examples DROP TABLE IF EXISTS categories; CREATE TABLE categories ( category_id serial PRIMARY KEY, category_name VARCHAR (255) NOT NULL ); DROP TABLE IF EXISTS products; CREATE TABLE products ( product_id serial PRIMARY KEY, product_name VARCHAR (255) NOT NULL, category_id INT NOT NULL, FOREIGN KEY (category_id) REFERENCES categories (category_id) ); SQL NATURAL JOIN examples INSERT INTO categories(category_name) VALUES ('Smart Phone’), ('Laptop’), ('Tablet’); INSERT INTO products (product_name, category_id) VALUES ('iPhone', 1), ('Samsung Galaxy', 1), ('HP Elite', 2), ('Lenovo Thinkpad', 2), ('iPad', 3), ('Kindle Fire', 3); SQL NATURAL JOIN examples SELECT * FROM products NATURAL JOIN categories; SQL Self-Join A self-join is a regular join that joins a table to itself. In practice, you typically use a self-join to query hierarchical data or to compare rows within the same table. The following query uses an INNER JOIN that joins the table to itself: SELECT select_list FROM table_name t1 INNER JOIN table_name t2 on join_predicate; SQL SELF JOIN examples CREATE TABLE employee ( employee_id INT PRIMARY KEY, first_name VARCHAR (255) NOT NULL, last_name VARCHAR (255) NOT NULL, manager_id INT, FOREIGN KEY (manager_id) REFERENCES employee (employee_id) ON DELETE CASCADE ); INSERT INTO employee ( employee_id, first_name, last_name, manager_id ) VALUES (1, 'Windy', 'Hays', NULL), (2, 'Ava', 'Christensen', 1), (3, 'Hassan', 'Conner', 1), (4, 'Anna', 'Reeves', 2), (5, 'Sau', 'Norman', 2), (6, 'Kelsie', 'Hays', 3), (7, 'Tory', 'Goff', 3), (8, 'Salley', 'Lester', 3); SQL SELF JOIN examples SELECT e.first_name || ‘ ’ || e.last_name employee, m.first_name || ‘ ’ || m.last_name manager FROM employee e INNER JOIN employee m ON m.employee_id = e.manager_id ORDER BY manager; SQL Cross Join A CROSS JOIN clause allows you to produce a Cartesian Product of rows in two or more tables. The following illustrates the syntax of the CROSS JOIN syntax: SELECT select_list FROM T1 CROSS JOIN T2; SQL CROSS JOIN example DROP TABLE IF EXISTS T1; CREATE TABLE T1 (label CHAR(1) PRIMARY KEY); DROP TABLE IF EXISTS T2; CREATE TABLE T2 (score INT PRIMARY KEY); INSERT INTO T1 (label) VALUES ('A’), ('B’); INSERT INTO T2 (score) VALUES (1), (2), (3); SQL CROSS JOIN example SELECT * FROM T1 CROSS JOIN T2; Lecture 6 Grouping Data, Subquery Senior Lecturer: Tulebayev Yersultan Grouping Data, Subquery PART I. Grouping Data. GROUP BY clause. PART II. Subquery. Types of Subqueries. PART I. Grouping Data. GROUP BY clause. Introduction to PostgreSQL GROUP BY clause The GROUP BY clause divides the rows returned from the SELECT statement into groups. For each group, you can apply an aggregate function e.g., SUM() to calculate the sum of items or COUNT() to get the number of items in the groups. The following statement illustrates the basic syntax of the GROUP BY clause: SELECT column_1, column_2,..., aggregate_function(column_3) FROM table_name GROUP BY column_1, column_2,...; Order of execution of the SQL statement PostgreSQL GROUP BY clause examples Using PostgreSQL GROUP BY without an aggregate function example SELECT customer_id FROM payment GROUP BY customer_id; Using PostgreSQL GROUP BY with SUM() function example SELECT customer_id, SUM (amount) FROM payment GROUP BY customer_id; Using PostgreSQL GROUP BY clause with the JOIN clause SELECT first_name || ' ' || last_name full_name, SUM (amount) amount FROM payment INNER JOIN customer USING (customer_id) GROUP BY full_name ORDER BY amount DESC; Using PostgreSQL GROUP BY with multiple columns SELECT customer_id, staff_id, SUM(amount) FROM payment GROUP BY staff_id, customer_id ORDER BY customer_id; Example INSERT INTO COMPANY DROP TABLE COMPANY; (ID,NAME,AGE,ADDRESS,SALARY) CREATE TABLE COMPANY( VALUES (1, 'Paul', 32, 'California', 20000.00 ), ID INT PRIMARY KEY NOT NULL, (2, 'Allen', 25, 'Texas', 15000.00 ), NAME TEXT NOT NULL, (3, 'Teddy', 23, 'Norway', 20000.00 ), AGE INT NOT NULL, (4, 'Mark', 25, 'Rich-Mond ', 65000.00 ), ADDRESS CHAR(50), (5, 'David', 27, 'Texas', 85000.00 ), SALARY REAL (6, 'Kim', 22, 'South-Hall', 45000.00 ), ); (7, 'James', 24, 'Houston', 10000.00 ), (8, 'Paul', 24, 'Houston', 20000.00), (9, 'James', 44, 'Norway', 5000.00), (10, 'James', 45, 'Texas', 5000.00); Example SELECT NAME, SUM(SALARY) FROM COMPANY GROUP BY NAME ORDER BY NAME; PART II. Subquery. Types of Subqueries. PostgreSQL Subqueries A subquery is a SQL query nested inside a larger query. Subquery Syntax: The subquery (inner query) executes once before the main query (outer query) executes. The main query (outer query) use the subquery result. There are a few rules that subqueries must follow: - Subqueries must be enclosed within parentheses. - A subquery can have only one column in the SELECT clause, unless multiple columns are in the main query for the subquery to compare its selected columns. - An ORDER BY cannot be used in a subquery, although the main query can use an ORDER BY. The GROUP BY can be used to perform the same function as the ORDER BY in a subquery. - Subqueries that return more than one row can only be used with multiple value operators, such as the IN, EXISTS, NOT IN, ANY/SOME, ALL operator. - The BETWEEN operator cannot be used with a subquery; however, the BETWEEN can be used within the subquery. PostgreSQL Subquery Example SELECT first_name,last_name, salary FROM employees WHERE salary > (SELECT max(salary) FROM employees WHERE first_name='Alexander'); Types of Subqueries The Subquery as Scalar Operand Comparisons using Subqueries Subqueries with ALL, ANY, IN, or SOME Row Subqueries Subqueries with EXISTS or NOT EXISTS Correlated Subqueries PostgreSQL Subquery as Scalar Operand A scalar subquery is a subquery that returns exactly one column value from one row. The SELECT query is executed and the single returned value is used in the surrounding value expression. It is an error to use a query that returns more than one row or more than one column as a scalar subquery. During a particular execution, if the subquery returns no rows, that is not an error; the scalar result is taken to be null. The subquery can refer to variables from the surrounding query, which will act as constants during any one evaluation of the subquery. Example SELECT employee_id, last_name FROM employees WHERE department_id = (SELECT department_id from departments WHERE location_id=2500); PostgreSQL Subqueries: Using Comparisons A subquery can be used before or after any of the comparison operators. Example SELECT employee_id,first_name,last_name,salary FROM employees WHERE salary > (SELECT AVG(SALARY) FROM employees); PostgreSQL Subqueries with ALL operator The ALL operator compares value to every value returned by the subquery. The result of ALL is true if all rows yield true (including the case where the subquery returns no rows). The result is false if any false result is found. The result is NULL if the comparison does not return false for any row, and it returns NULL for at least one row. Example SELECT department_id, SALARY FROM employees WHERE SALARY>=ALL (SELECT AVG(SALARY) FROM employees GROUP BY department_id); PostgreSQL Subqueries with ANY/SOME operator The ANY operator compares the value to each value returned by the subquery. Therefore ANY keyword (which must follow a comparison operator) returns TRUE if the comparison is TRUE for ANY of the values in the column that the subquery returns. Syntax: expression operator ANY (subquery) Example SELECT first_name, last_name,department_id FROM employees WHERE department_id= ANY (SELECT DEPARTMENT_ID FROM departments WHERE location_id=1700); PostgreSQL Subqueries with IN operator Syntax: expression IN (subquery) The result of IN is true if any equal subquery row is found. The result is “false” if no equal row is found (including the case where the subquery returns no rows). If the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the IN construct will be null, not false. Example SELECT first_name, last_name,department_id FROM employees WHERE department_id IN (SELECT DEPARTMENT_ID FROM departments WHERE location_id=1800); PostgreSQL Subqueries with NOT IN operator Syntax: Expression NOT IN (subquery) The result of IN is true if any equal subquery row is found. The result is “false” if no equal row is found (including the case where the subquery returns no rows). If the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the IN construct will be null, not false. Example SELECT first_name, last_name,department_id FROM employees WHERE department_id NOT IN (SELECT DEPARTMENT_ID FROM departments WHERE manager_id BETWEEN 100 AND 200); PostgreSQL Subqueries with EXISTS operator Syntax: EXISTS (subquery) The argument of EXISTS is an arbitrary SELECT statement, or subquery. The subquery is evaluated to determine whether it returns any rows. If it returns at least one row, the result of EXISTS is true; if the subquery returns no rows, the result of EXISTS is false. Example SELECT employee_id, first_name, last_name, job_id, department_id FROM employees E WHERE EXISTS (SELECT * FROM employees WHERE manager_id = E.employee_id); PostgreSQL Row Subqueries A row subquery is a subquery that returns a single row and more than one column value. You can use = , >, =, (SELECT AVG(salary) FROM employees WHERE department_id = outerr.department_id); Lecture 7 Aggregate functions. Window functions. Senior Lecturer: Tulebayev Yersultan Aggregate functions. Window functions. PART I. Aggregate functions. PART II. Window functions. PART I. Aggregate functions. Introduction to PostgreSQL aggregate functions Aggregate functions perform a calculation on a set of rows and return a single row. PostgreSQL provides all standard SQL’s aggregate functions as follows: AVG() – return the average value. COUNT() – return the number of values. MAX() – return the maximum value. MIN() – return the minimum value. SUM() – return the sum of all or distinct values. We often use the aggregate functions with the GROUP BY clause in the SELECT statement. In these cases, the GROUP BY clause divides the result set into groups of rows and the aggregate functions perform a calculation on each group e.g., maximum, minimum, average, etc. You can use aggregate functions as expressions only in the following clauses: SELECT clause. HAVING clause. PostgreSQL AVG Function The AVG() function is one of the most commonly used aggregate functions in PostgreSQL. The AVG() function allows you to calculate the average value of a set. The syntax of the AVG() function is as follows: AVG(column) You can use the AVG() function in the SELECT and HAVING clauses. Example SELECT AVG(amount) FROM payment; PostgreSQL AVG() function with DISTINCT operator SELECT AVG(DISTINCT amount)::numeric(10,2) FROM payment; PostgreSQL AVG function with SUM function SELECT AVG(amount)::numeric(10,2), SUM(amount)::numeric(10,2) FROM payment; PostgreSQL AVG() function with GROUP BY clause SELECT customer_id, first_name, last_name, AVG (amount)::NUMERIC(10,2) FROM payment INNER JOIN customer USING(customer_id) GROUP BY customer_id ORDER BY customer_id; PostgreSQL AVG() function with HAVING clause SELECT customer_id, first_name, last_name, AVG (amount)::NUMERIC(10,2) FROM payment INNER JOIN customer USING(customer_id) GROUP BY customer_id HAVING AVG (amount) > 5 ORDER BY customer_id; PostgreSQL AVG() function and NULL CREATE TABLE t1 ( SELECT AVG(amount)::numeric(10,2) id serial PRIMARY KEY, FROM t1; amount INTEGER ); INSERT INTO t1 (amount) VALUES (10), (NULL), (30); PostgreSQL COUNT Function The COUNT() function is an aggregate function that allows you to get the number of rows that match a specific condition of a query. SELECT COUNT(column) FROM table_name WHERE condition; PostgreSQL COUNT() function examples SELECT COUNT(*) FROM payment; PostgreSQL COUNT() with GROUP BY clause SELECT customer_id, COUNT (customer_id) FROM payment GROUP BY customer_id; PostgreSQL COUNT() with HAVING clause SELECT customer_id, COUNT (customer_id) FROM payment GROUP BY customer_id HAVING COUNT (customer_id) > 40; PostgreSQL SUM Function The PostgreSQL SUM() is an aggregate function that returns the sum of values or distinct values. The syntax of the SUM() function is as follows: SUM(DISTINCT expression) PostgreSQL SUM() function examples SELECT SUM (amount) AS total FROM payment WHERE customer_id = 2000; Using PostgreSQL SUM() function with GROUP BY clause SELECT customer_id, SUM (amount) AS total FROM payment GROUP BY customer_id ORDER BY total; Using PostgreSQL SUM with expression SELECT SUM(return_date - rental_date ) FROM rental; PostgreSQL MAX Function PostgreSQL MAX function is an aggregate function that returns the maximum value in a set of values. The syntax of the MAX function is as follows: MAX(expression); You can use the MAX function not only in the SELECT clause but also in the WHERE and HAVING clauses. PostgreSQL MAX function examples SELECT MAX(amount) FROM payment; PostgreSQL MAX function in subquery SELECT * FROM payment WHERE amount = ( SELECT MAX (amount) FROM payment ); PostgreSQL MIN Function PostgreSQL MIN() function an aggregate function that returns the minimum value in a set of values. The syntax of the MIN() function is as follows: SELECT MIN(expression) FROM table_expression...; Unlike the AVG(), COUNT() and SUM() functions, the DISTINCT option does not have any effects on the MIN() function. PostgreSQL MIN() function examples SELECT MIN (rental_rate) FROM film; Using PostgreSQL MIN function with GROUP BY clause SELECT name category, MIN(replacement_cost) replacement_cost FROM category INNER JOIN film_category USING (category_id) INNER JOIN film USING (film_id) GROUP BY name ORDER BY name; PART II. Window functions. PostgreSQL Window Functions CREATE TABLE product_groups ( group_id serial PRIMARY KEY, group_name VARCHAR (255) NOT NULL ); CREATE TABLE products ( product_id serial PRIMARY KEY, product_name VARCHAR (255) NOT NULL, price DECIMAL (11, 2), group_id INT NOT NULL, FOREIGN KEY (group_id) REFERENCES product_groups (group_id) ); PostgreSQL Window Functions INSERT INTO product_groups (group_name) VALUES ('Smartphone'), ('Laptop'), ('Tablet'); INSERT INTO products (product_name, group_id,price) VALUES ('Microsoft Lumia', 1, 200), ('HTC One', 1, 400), ('Nexus', 1, 500), ('iPhone', 1, 900), ('HP Elite', 2, 1200), ('Lenovo Thinkpad', 2, 700), ('Sony VAIO', 2, 700), ('Dell Vostro', 2, 800), ('iPad', 3, 700), ('Kindle Fire', 3, 150), ('Samsung Galaxy Tab', 3, 200); Introduction to PostgreSQL window functions SELECT group_name, AVG (price) FROM products INNER JOIN product_groups USING (group_id) GROUP BY group_name; Example SELECT product_name, price, group_name, AVG (price) OVER ( PARTITION BY group_name ) FROM products INNER JOIN product_groups USING (group_id); PostgreSQL Window Function A window function performs a calculation across a set of table rows that are somehow related to the current row. PostgreSQL Window Function Syntax: window_function(arg1, arg2,..) OVER ( [PARTITION BY partition_expression] [ORDER BY sort_expression [ASC | DESC] [NULLS {FIRST | LAST }]) Window functions The ROW_NUMBER() function The ROW_NUMBER() function assigns a sequential number to each row in each partition. See the following query: SELECT product_name, group_name, price, ROW_NUMBER () OVER ( PARTITION BY group_name ORDER BY price ) FROM products INNER JOIN product_groups USING (group_id); The Rank() function The RANK() function assigns ranking within an ordered partition. If rows have the same values, the RANK() function assigns the same rank, with the next ranking(s) skipped. SELECT product_name, group_name, price, RANK () OVER ( PARTITION BY group_name ORDER BY price ) FROM products INNER JOIN product_groups USING (group_id); The Dense_rank() function Similar to the RANK() function, the DENSE_RANK() function assigns a rank to each row within an ordered partition, but the ranks have no gap. In other words, the same ranks are assigned to multiple rows and no ranks are skipped. SELECT product_name, group_name, price, DENSE_RANK () OVER ( PARTITION BY group_name ORDER BY price ) FROM products INNER JOIN product_groups USING (group_id); The FIRST_VALUE() function The FIRST_VALUE() function returns a value evaluated against the first row within its partition. SELECT product_name, group_name, price, FIRST_VALUE (price) OVER ( PARTITION BY group_name ORDER BY price ) AS lowest_price_per_group FROM products INNER JOIN product_groups USING (group_id); The LAST_VALUE() function The LAST_VALUE() function returns a value evaluated against the last row in its partition. SELECT product_name, group_name, price, LAST_VALUE (price) OVER ( PARTITION BY group_name ORDER BY price ) AS highest_price_per_group FROM products INNER JOIN product_groups USING (group_id); The LAG and LEAD functions The LAG() function has the ability to access data from the previous row, while the LEAD() function can access data from the next row. Both LAG() and LEAD() functions have the same syntax as follows: LAG (expression [,offset] [,default]) over_clause; LEAD (expression [,offset] [,default]) over_clause; In this syntax: - expression – a column or expression to compute the returned value. - offset – the number of rows preceding ( LAG)/ following ( LEAD) the current row. It defaults to 1. - default – the default returned value if the offset goes beyond the scope of the window. The default is NULL if you skip it. The LAG() function example SELECT product_name, group_name, price, LAG (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS prev_price, price - LAG (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS cur_prev_diff FROM products INNER JOIN product_groups USING (group_id); The LEAD() function example SELECT product_name, group_name, price, LEAD (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS next_price, price - LEAD (price, 1) OVER ( PARTITION BY group_name ORDER BY price ) AS cur_next_diff FROM products INNER JOIN product_groups USING (group_id); Lecture 8 Set Operations, Grouping sets, Cube, and Rollup Senior Lecturer: Tulebayev Yersultan Set Operations, Grouping sets, Cube, and Rollup PART I. Set operators: UNION, UNION ALL, INTERSECT, EXCEPT. PART II. Grouping sets. Cube and Rollup. PART I. Set operators: UNION, UNION ALL, INTERSECT, EXCEPT. Set operators Set operators are used to join the results of two (or more) SELECT statements. Types of set operators: - UNION; - UNION ALL; - INTERSECT; - EXCEPT; Points to remember Same number of columns must be selected by all participating SELECT statements. Column names used in the display are taken from the first query. Positional ordering must be used to sort the result set. Individual result set ordering is not allowed with Set operators. ORDER BY can appear once at the end of the query. For example, UNION and INTERSECT operators are commutative, i.e. the order of queries is not important; it doesn't change the final result. Performance wise, UNION ALL shows better performance as compared to UNION because resources are not wasted in filtering duplicates and sorting the result set. Set operators can be the part of sub queries. Set operators can't be used in SELECT statements containing TABLE collection expressions. The LONG, BLOB, CLOB, BFILE, VARRAY,or nested table are not permitted for use in Set operators.For update clause is not allowed with the set operators. PostgreSQL UNION The UNION operator combines result sets of two or more SELECT statements into a single result set. The following illustrates the syntax of the UNION operator that combines result sets from two queries: SELECT select_list_1 FROM table_expresssion_1 UNION SELECT select_list_2 FROM table_expression_2 POSTGRESQL: UNION OPERATOR The UNION operator combines result sets of two or more SELECT statements into a single result set. Removes all duplicate rows. Both queries must return same number of rows. The corresponding columns in the queries must have compatible data types. The following Venn diagram illustrates how to the UNION works: POSTGRESQL: UNION ALL OPERATOR The UNION ALL operator combines result sets of two or more SELECT statements into a single result set. Does not remove duplicate rows. Both queries must return same number of rows. The corresponding columns in the queries must have compatible data types. Syntax: SELECT select_list_1 FROM table1 UNION ALL SELECT select_list_2 FROM table2 Example DROP TABLE IF EXISTS top_rated_films; CREATE TABLE top_rated_films( title VARCHAR NOT NULL, release_year SMALLINT ); CREATE TABLE most_popular_films( title VARCHAR NOT NULL, release_year SMALLINT ); Example INSERT INTO top_rated_films(title,release_year) VALUES ('The Shawshank Redemption',1994), ('The Godfather',1972), ('12 Angry Men',1957); INSERT INTO most_popular_films(title,release_year) VALUES ('An American Pickle',2020), ('The Godfather',1972), ('Greyhound',2020); PostgreSQL UNION examples SELECT * FROM top_rated_films UNION SELECT * FROM most_popular_films; PostgreSQL UNION ALL example SELECT * FROM top_rated_films UNION ALL SELECT * FROM most_popular_films; PostgreSQL UNION ALL with ORDER BY clause example SELECT * FROM top_rated_films UNION ALL SELECT * FROM most_popular_films ORDER BY title; PostgreSQL INTERSECT Operator The PostgreSQL INTERSECT operator combines result sets of two or more SELECT statements into a single result set. The INTERSECT operator returns any rows that are available in both result sets. The following illustrates the syntax of the INTERSECT operator: SELECT select_list FROM A INTERSECT SELECT select_list FROM B; POSTGRESQL: INTERSECT OPERATOR Used to combine result set of two or more SELECT statement into a single result. The INTERSECT operator returns all rows in both result sets. The number of columns that appear in the SELECT statement must be the same. Data types of the columns must be compatible. The following illustration shows the final result set produced by the INTERSECT operator. PostgreSQL INTERSECT operator examples SELECT * FROM most_popular_films INTERSECT SELECT * FROM top_rated_films; PostgreSQL EXCEPT Like the UNION and INTERSECT operators, the EXCEPT operator returns rows by comparing the result sets of two or more queries. The EXCEPT operator returns distinct rows from the first (left) query that are not in the output of the second (right) query. The following illustrates the syntax of the EXCEPT operator. SELECT select_list FROM A EXCEPT SELECT select_list FROM B; POSTGRESQL: EXCEPT OPERATOR Returns rows by comparing the result sets of two or more queries. Returns rows in first query not present in output of the second query. Returns distinct rows from the first (left) query not in output of the second (right) query. The number of columns and their order must be the same in both queries. The data types of the respective columns must be compatible. The following Venn diagram illustrates the EXCEPT operator: PostgreSQL EXCEPT operator examples SELECT * FROM top_rated_films EXCEPT SELECT * FROM most_popular_films; PART II. Grouping sets. Cube and Rollup. PostgreSQL GROUPING SETS Let’s get started by creating a new table called sales for the demonstration. DROP TABLE IF EXISTS sales; CREATE TABLE sales ( brand VARCHAR NOT NULL, segment VARCHAR NOT NULL, quantity INT NOT NULL, PRIMARY KEY (brand, segment) ); INSERT INTO sales (brand, segment, quantity) VALUES ('ABC', 'Premium', 100), ('ABC', 'Basic', 200), ('XYZ', 'Premium', 100), ('XYZ', 'Basic', 300); Introduction to PostgreSQL GROUPING SETS A grouping set is a set of columns by which you group by using the GROUP BY clause. A grouping set is denoted by a comma-separated list of columns placed inside parentheses: (column1, column2,...) Example SELECT brand, segment, SUM (quantity) FROM sales GROUP BY brand, segment; Example SELECT brand, SUM (quantity) FROM sales GROUP BY brand; SELECT segment, SUM (quantity) FROM sales GROUP BY segment; Grouping sets clause The GROUPING SETS allows you to define multiple grouping sets in the same query. The general syntax of the GROUPING SETS is as follows: SELECT c1, c2, aggregate_function(c3) FROM table_name GROUP BY GROUPING SETS ( (c1, c2), (c1), (c2), () ); Example SELECT brand, segment, SUM (quantity) FROM sales GROUP BY GROUPING SETS ( (brand, segment), (brand), (segment), () ); Grouping function The GROUPING() function accepts an argument which can be a column name or an expression: GROUPING( column_name | expression) The column_name or expression must match with the one specified in the GROUP BY clause. The GROUPING() function returns bit 0 if the argument is a member of the current grouping set and 1 otherwise. Example SELECT GROUPING(brand) grouping_brand, GROUPING(segment) grouping_segment, brand, segment, SUM (quantity) FROM sales GROUP BY GROUPING SETS ( (brand), (segment), () ) ORDER BY brand, segment; PostgreSQL CUBE PostgreSQL CUBE is a subclause of the GROUP BY clause. The CUBE allows you to generate multiple grouping sets. The following illustrates the syntax of the CUBE subclause: SELECT c1, c2, c3, aggregate (c4) FROM table_name GROUP BY CUBE (c1, c2, c3); Explanation CUBE(c1,c2,c3) GROUPING SETS ( (c1,c2,c3), (c1,c2), (c1,c3), (c2,c3), (c1), (c2), (c3), () ) PostgreSQL CUBE examples SELECT brand, segment, SUM (quantity) FROM sales GROUP BY CUBE (brand, segment) ORDER BY brand, segment; PostgreSQL ROLLUP The ROLLUP assumes a hierarchy among the input columns and generates all grouping sets that make sense considering the hierarchy. This is the reason why ROLLUP is often used to generate the subtotals and the grand total for reports. The following illustrates the syntax of the ROLLUP subclause: SELECT c1, c2, c3, aggregate (c4) FROM table_name GROUP BY ROLLUP (c1, c2, c3); Explanation ROLLUP(c1,c2,c3) (c1, c2, c3) (c1, c2) (c1) () PostgreSQL ROLLUP examples DROP TABLE IF EXISTS sales; CREATE TABLE sales ( brand VARCHAR NOT NULL, segment VARCHAR NOT NULL, quantity INT NOT NULL, PRIMARY KEY (brand, segment) ); INSERT INTO sales (brand, segment, quantity) VALUES ('ABC', 'Premium', 100), ('ABC', 'Basic', 200), ('XYZ', 'Premium', 100), ('XYZ', 'Basic', 300); PostgreSQL ROLLUP examples SELECT brand, segment, SUM (quantity) FROM sales GROUP BY ROLLUP (brand, segment) ORDER BY brand, segment; Lecture 9 POSTGRESQL VIEW. INDEX. Senior Lecturer: Tulebayev Yersultan POSTGRESQL VIEW. INDEX. PART I. POSTGRESQL VIEW PART II. POSTGRESQL INDEX PART I. POSTGRESQL VIEW PostgreSQL Views A view is a named query that provides another way to present data in the database tables. A view is a stored query. A view can be accessed as a virtual table in PostgreSQL. In other words, a PostgreSQL view is a logical table that represents data of one or more underlying tables through a SELECT statement. PostgreSQL Views A view can be very useful in some cases such as: A view helps simplify the complexity of a query because you can query a view, which is based on a complex query, using a simple SELECT statement. Like a table, you can grant permission to users through a view that contains specific data that the users are authorized to see. A view provides a consistent layer even the columns of the underlying table change. Creating PostgreSQL Views To create a view, we use CREATE VIEW statement. The simplest syntax of the CREATE VIEW statement is as follows: CREATE VIEW view_name AS query; PostgreSQL CREATE VIEW example For example, in our sample database, we have four tables: 1) customer – stores all customer data 2) address – stores address of customers 3) city – stores city data 4) country– stores country data PostgreSQL CREATE VIEW example SELECT cu.customer_id AS id, cu.first_name || ' ' || cu.last_name AS name, a.address, a.postal_code AS "zip code", a.phone, city.city, country.country, CASE WHEN cu.activebool THEN 'active' ELSE '' END AS notes, cu.store_id AS sid FROM customer cu INNER JOIN address a USING (address_id) INNER JOIN city USING (city_id) INNER JOIN country USING (country_id); PostgreSQL CREATE VIEW example CREATE VIEW customer_master AS SELECT SELECT cu.customer_id AS id, cu.first_name || ' ' || cu.last_name AS name, * a.address, FROM a.postal_code AS "zip code", a.phone, customer_master; city.city, country.country, CASE WHEN cu.activebool THEN 'active' ELSE '' END AS notes, cu.store_id AS sid FROM customer cu INNER JOIN address a USING (address_id) INNER JOIN city USING (city_id) INNER JOIN country USING (country_id); Changing PostgreSQL Views To change the defining query of a view, you use the CREATE VIEW statement with OR REPLACE addition as follows: CREATE OR REPLACE view_name AS query ALTER VIEW To change the definition of a view, you use the ALTER VIEW statement. For example, you can change the name of the view from customer_master to customer_info by using the following statement: ALTER VIEW customer_master RENAME TO customer_info; PostgreSQL Drop View The DROP VIEW statement removes a view from the database. The following illustrates the syntax of the DROP VIEW statement: DROP VIEW [IF EXISTS] view_name [CASCADE | RESTRICT] To remove multiple views using a single statement, you specify a comma-separated list of view names after the DROP VIEW keywords like this: DROP VIEW [IF EXISTS] view_name1, view_name2,...; PostgreSQL DROP VIEW statement examples Example CREATE VIEW film_master AS SELECT film_id, title, release_year, length, name category FROM film INNER JOIN film_category USING (film_id) INNER JOIN category USING(category_id); Example CREATE VIEW horror_film AS SELECT film_id, title, release_year, length FROM film_master WHERE category = 'Horror'; Using PostgreSQL DROP VIEW to drop one view The following example uses the DROP VIEW statement to drop the comedy_film view: DROP VIEW horror_film; Using PostgreSQL DROP VIEW statement to drop a view that has dependent objects The following statement uses the DROP VIEW statement to drop the film_master view: DROP VIEW film_master; PostgreSQL issued an error: To drop the view film_master, you need to drop its dependent object first or use the CASCADE option like this: DROP VIEW film_master CASCADE; PostgreSQL Materialized Views The PostgreSQL materialized views that allow you to store the result of a query physically and update the data periodically. To create a materialized view, you use the CREATE MATERIALIZED VIEW statement as follows: CREATE MATERIALIZED VIEW view_name AS query WITH [NO] DATA; Refreshing data for materialized views To load data into a materialized view, you use the REFRESH MATERIALIZED VIEW statement as shown below: REFRESH MATERIALIZED VIEW view_name; Removing materialized views Removing a materialized view is pretty straightforward as we have done for tables or views. This is done using the following statement: DROP MATERIALIZED VIEW view_name; PostgreSQL materialized views example CREATE MATERIALIZED VIEW rental_by_category AS SELECT c.name AS category, sum(p.amount) AS total_sales FROM (((((payment p JOIN rental r ON ((p.rental_id = r.rental_id))) JOIN inventory i ON ((r.inventory_id = i.inventory_id))) JOIN film f ON ((i.film_id = f.film_id))) JOIN film_category fc ON ((f.film_id = fc.film_id))) JOIN category c ON ((fc.category_id = c.category_id))) GROUP BY c.name ORDER BY sum(p.amount) DESC WITH NO DATA; Example SELECT * FROM rental_by_category; PostgreSQL is helpful to give you a hint to ask for loading data into the view. Let’s do it by executing the following statement: REFRESH MATERIALIZED VIEW rental_by_category; Creating Updatable Views Using the WITH CHECK OPTION Clause An updatable view allows you to change the data of the base table through the view. Let’s take a look at the city and country tables in the sample database. The following statement creates an updatable view named usa_city that returns all cities in the Untied States. Example CREATE VIEW usa_city AS SELECT INSERT INTO usa_city (city, country_id) city_id, VALUES ('Birmingham', 102); city, INSERT INTO usa_city (city, country_id) country_id VALUES ('Cambridge', 102); FROM city WHERE country_id = 103 ORDER BY city; Example CREATE Now, run the following statement to insert another city for the United Kingdom country. OR REPLACE VIEW usa_city AS SELECT city_id, UPDATE usa_city city, SET country_id = 102 WHERE country_id city_id = 135; FROM PostgreSQL rejected the insert and issued an error. city WHERE country_id = 103 ORDER BY city WITH CHECK OPTION; The scope of check with LOCAL and CASCADED First, create a view that returns all cities with the name starting with the letter A. CREATE VIEW city_a AS SELECT city_id, city, country_id FROM city WHERE city LIKE 'A%'; Example CREATE INSERT INTO city_a_usa (city, country_id) OR REPLACE VIEW city_a_usa AS SELECT VALUES ('Houston', 103); city_id, city, country_id FROM city_a WHERE country_id = 103 WITH CASCADED CHECK OPTION; Example CREATE OR REPLACE VIEW city_a_usa INSERT INTO city_a_usa (city, country_id) AS SELECT VALUES ('Houston', 103); city_id, city, country_id FROM city_a WHERE country_id = 103 WITH LOCAL CHECK OPTION; PART II. POSTGRESQL INDEX PostgreSQL CREATE INDEX An index is a separated data structure that speeds up the data retrieval on a table at the cost of additional writes and storage to maintain the index. PostgreSQL indexes are effective tools to enhance database performance. Indexes help the database server find specific rows much faster than it could do without indexes. ¡ However, indexes add write and storage overheads to the database system. Therefore, using them appropriately is very important. EXPLANATION Let’s assume we have a table: CREATE TABLE test1 ( Id INT, Content VARCHAR ); SELECT content FROM test1 WHERE id = number; Syntax In this syntax: First, specify the index name after the CREATE INDEX clause. The index name should be meaningful and easy to remember. Second, specify the name of the table to which the index belongs. Third, specify the index method such as btree, hash, gist, spgist, gin, and brin. PostgreSQL uses btree by default. Fourth, list one or more columns that are to be stored in the index. The ASC and DESC specify the sort order. ASC is the default. NULLS FIRST or NULLS LAST specifies nulls sort before or after non-nulls. The NULLS FIRST is the default when DESC is specified and NULLS LAST is the default when DESC is not specified. To check if a query uses an index or not, you use the EXPLAIN statement. CREATION EXAMPLE CREATE INDEX test1_id_index ON test1 (id); To drop index you need to use: DROP INDEX index_name EXAMPLE EXAMPLE CONT. LIST INDEXES: Example UNIQUE INDEXES table_name ( a,b,c,...); MULTICOLUMN INDEXES You can create an index on more than one column of a table. This index is called a multicolumn index, a composite index, a combined index, or a concatenated index. A multicolumn index can have maximum of 32 columns of a table. The limit can be changed by modifying the pg_config_manual.h when building PostgreSQL. In addition, only B-tree, GIST, GIN, and BRIN index types support multicolumn indexes. CREATE INDEX index_name table_name(a,b,c,...); MULTICOLUMN INDEXES INDEXES ON EXPRESSIONS (FUNCTIONAL-BASED INDEXES) Example REINDEX REINDEX VS. DROP INDEX & CREATE INDEX Lecture 10 Transaction. ACID. Trigger Senior Lecturer: Tulebayev Yersultan Transaction. Trigger PART I. PostgreSQL Transaction. ACID. PART II. PostgreSQL Trigger. PART I. PostgreSQL Transaction. ACID PostgreSQL Transaction A database transaction is a single unit of work that consists of one or more operations. A database transaction is a collection of SQL queries that are treated as one unit of work. Basically, you begin a transaction and then do some queries, and then end the transaction, this is one unit of work. ACID Atomicity guarantees that the transaction completes in an all- or-nothing manner. Consistency ensures the change to data written to the database must be valid and follow predefined rules. Isolation determines how transaction integrity is visible to other transactions. Durability makes sure that transactions that have been committed will be stored in the database permanently. Setting up a sample table DROP TABLE IF EXISTS accounts; CREATE TABLE accounts ( id INT GENERATED BY DEFAULT AS IDENTITY, name VARCHAR(100) NOT NULL, balance DEC(15,2) NOT NULL, PRIMARY KEY(id) ); Begin a transaction BEGIN TRANSACTION; or BEGIN WORK; or just: BEGIN; Example BEGIN; INSERT INTO accounts(name,balance) VALUES('Alice',10000); Example From the current session, you can see the change by querying the accounts table: SELECT id, name, balance FROM accounts; However, if you start a new session and execute the query above, you will not see the change. SELECT id, name, balance FROM accounts; Commit a transaction COMMIT WORK; or COMMIT TRANSACTION; or simply: COMMIT; The following COMMIT statement inserts Alice’s account to the accounts table: -- start a transaction BEGIN; -- insert a new row into the accounts table INSERT INTO accounts(name,balance) VALUES('Alice',10000); -- commit the change (or roll it back later) COMMIT; PostgreSQL COMMIT: Bank account transfer example In the first session, start a new transaction: BEGIN; and subtracting 1000USD from Bob’s account with id 1: UPDATE accounts SET balance = balance - 1000 WHERE id = 1; Example In the second session, check the account balance of both accounts: SELECT id, name, balance FROM accounts; Example Next, add the same amount (1000USD ) to Alice’s account: UPDATE accounts SET balance = balance + 1000 WHERE id = 2; This change also is not visible to the second session until we commit it: COMMIT; Example Now, you can view the change from any session: SELECT id, name, balance FROM accounts; Example -- start a transaction BEGIN; -- deduct 1000 from account 1 UPDATE accounts SET balance = balance - 1000 WHERE id = 1; -- add 1000 to account 2 UPDATE accounts SET balance = balance + 1000 WHERE id = 2; -- select the data from accounts SELECT id, name, balance FROM accounts; -- commit the transaction COMMIT; Rolling back a transaction To roll back or undo the change of the current transaction, you use any of the following statement: ROLLBACK WORK; or ROLLBACK TRANSACTION; or in short: ROLLBACK; Example -- begin the transaction BEGIN; -- deduct the amount from the account 1 UPDATE accounts SET balance = balance - 1500 WHERE id = 1; -- add the amount from the account 3 (instead of 2) UPDATE accounts SET balance = balance + 1500 WHERE id = 3; -- roll back the transaction ROLLBACK; Atomicity Atomicity ensures that all the operations within a transaction are treated as a single, indivisible unit of work. This means that if any operation within the transaction fails, the entire transaction will be rolled back and none of the changes will be committed to the database. Consistency Consistency ensures that the transaction brings the database from one valid state to another. This means that the transaction must follow all the rules and constraints defined in the database schema, such as unique keys, foreign keys, and check constraints. Consistency in data This represented the state that actually persisted in the data. This mainly involves enforcing. - foreign keys referential integrity between two tables or documents an example can be seen when creating a user-like system, when a user(s) like an image or blog, the blog or image should persist the actual number of likes it has got from the user table. - atomicity – data should persist across the database. - Isolation – based on the isolation level which we will talk about later should return correct reads from two concurrent parallel tables. Isolation Isolation ensures that concurrent transactions do not interfere with each other. Each transaction is executed as if it is the only transaction in the system, even though other transactions may be executing simultaneously. It is the result of having transactions as a separate entity (isolation) from other concurrent transactions which may lead to reads phenomena and Isolation levels. Isolation Levels for inflight transactions **READ COMMITTED** **SERIALIZABLE** **REPEATABLE READ** **READ UNCOMMITTED** **SNAPSHOTS** Durability Durability ensures that the changes made by a committed transaction are permanent and will survive any subsequent failures. This is typically achieved by writing or persisting the changes to disk or other non-volatile storage. This involves a system that can recover all writes and see all changes after committed even when the system crashes or loss of power. Durability techniques WAL – Write ahead log: Writing a lot of data to disk is expensive (Indexes, data, files, rows, etc.). That is why DBMs persist in a compressed version of the changes as WAL. Any changes go to disk first. When a crash happens, we can read all the WAL data and rebuild the state. Asynchronous snapshot: As we write we keep everything in the memory and asynchronously in the background we snapshot everything to disk at once. AOF – Append only file: This is similar to the WAL, keep track of the changes before it happens and then write everything to disk. Operating System (OS) cache – A write request in OS usually goes to the OS cache. When the writes go to the OS cache, an OS crash, or machine restart could lead to loss of data. Fsync OS command forces write to always go to disk, this command can be expensive and slow down commits. PART II. PostgreSQL Trigger. Introduction to PostgreSQL Trigger A PostgreSQL trigger is a function invoked automatically whenever an event associated with a table occurs. An event could be any of the following: INSERT, UPDATE, DELETE or TRUNCATE. A trigger is a special user-defined function associated with a table. To create a new trigger, you define a trigger function first, and then bind this trigger function to a table. PostgreSQL trigger types PostgreSQL provides two main types of triggers: Row-level triggers Statement-level triggers. The differences between the two kinds are how many times the trigger is invoked and at what time. When to use triggers Triggers are useful in case the database is accessed by various applications, and you want to keep the cross-functionality within the database that runs automatically whenever the data of the table is modified. For example, if you want to keep the history of data without requiring the application to have logic to check for every event such as INSERT or UDPATE. PostgreSQL triggers vs SQL standard triggers Even though PostgreSQL implements SQL standard, triggers in PostgreSQL has some specific features: PostgreSQL fires trigger for the TRUNCATE event. PostgreSQL allows you to define the statement-level trigger on views. PostgreSQL requires you to define a user-defined function as the action of the trigger, while the SQL standard allows you to use any SQL commands. PostgreSQL CREATE TRIGGER The following illustrates the syntax of creating trigger function: CREATE FUNCTION trigger_function() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN -- trigger logic END; $$ Introduction to PostgreSQL CREATE TRIGGER statement The CREATE TRIGGER statement creates a new trigger. The following illustrates the basic syntax of the CREATE TRIGGER statement: CREATE TRIGGER trigger_name {BEFORE | AFTER} { event } ON table_name [FOR [EACH] { ROW | STATEMENT }] EXECUTE PROCEDURE trigger_function PostgreSQL CREATE TRIGGER example DROP TABLE IF EXISTS employees; CREATE TABLE employees( id INT GENERATED ALWAYS AS IDENTITY, first_name VARCHAR(40) NOT NULL, last_name VARCHAR(40) NOT NULL, PRIMARY KEY(id) ); Example CREATE TABLE employee_audits ( id INT GENERATED ALWAYS AS IDENTITY, employee_id INT NOT NULL, last_name VARCHAR(40) NOT NULL, changed_on TIMESTAMP(6) NOT NULL ); Example CREATE OR REPLACE FUNCTION log_last_name_changes() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN IF NEW.last_name OLD.last_name THEN INSERT INTO employee_audits(employee_id,last_name,changed_on) VALUES(OLD.id,OLD.last_name,now()); END IF; RETURN NEW; END; $$ Example CREATE TRIGGER last_name_changes BEFORE UPDATE ON employees FOR EACH ROW EXECUTE PROCEDURE log_last_name_changes(); Example INSERT INTO employees (first_name, last_name) VALUES ('John', 'Doe'); INSERT INTO employees (first_name, last_name) VALUES ('Lily', 'Bush’); SELECT * FROM employees; Example UPDATE employees SET last_name = 'Brown' WHERE ID = 2; SELECT * FROM employees; SELECT * FROM employee_audits; PostgreSQL DROP TRIGGER To delete a trigger from a table, you use the DROP TRIGGER statement with the following syntax: DROP TRIGGER [IF EXISTS] trigger_name ON table_name [ CASCADE | RESTRICT ]; Example DROP TRIGGER last_name_changes ON employee; PostgreSQL ALTER TRIGGER The ALTER TRIGGER statement allows you to rename a trigger. The following shows the syntax of the ALTER TRIGGER statement: ALTER TRIGGER trigger_name ON table_name RENAME TO new_trigger_name; Example ALTER TRIGGER before_update_salary ON employees RENAME TO salary_before_update;