Document Details

DexterousFern6890

Uploaded by DexterousFern6890

NUS

Tags

database programming SQL data aggregation information technology

Full Transcript

IT5008 Database Design & Programming L03: Multiple Tables IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 1 / 34 Aggregation IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from...

IT5008 Database Design & Programming L03: Multiple Tables IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 1 / 34 Aggregation IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 2 / 34 Aggregation ❱ Functions Basic Functions Distinct Example Basic Grouping Having Aggregation Functions The values of column can be aggregated using aggregation functions such as COUNT(), SUM(), MAX(), MIN(), AVG(), STDDEV(), etc. PostgreSQL also allows user-defined aggregate functions. SELECT COUNT(*) SELECT COUNT(c.customerid) count FROM customers c; FROM customers c; 1000 Count the number of SELECT COUNT(ALL c.country) ALL is default and rows in the table. FROM customers c; often omitted. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 3 / 34 Aggregation ❱ Functions Basic Functions Distinct Example Distinct Grouping Having DISTINCT Keyword We need to add the keyword DISTINCRT inside the COUNT() aggregate function if we want to count the number of different countries in the column country of the table customers. The keyword DISTINCT can be used in other aggregate functions similarly. SELECT COUNT(DISTINCT c.country) count FROM customers c; 5 IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 4 / 34 Aggregation ❱ Functions Basic Functions Distinct Example Example Grouping Having Aggregate Functions Example The following query finds the maximum, minimum, average, and standard deviation prices of our games. It uses the arithmetic TRUCT() to display two decimal places for average and standard deviation. SELECT MAX(g.price), MIN(g.price), max min avg std TRUNC(AVG(g.price), 2) AS avg, 12 1.99 6.97 3.96 TRUNC(STDDEV(g.price), 2) AS std FROM games g; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 5 / 34 Aggregation Functions Grouping ❱ Grouping Logical Logical Aggregation Where From GROUP BY Select Renaming The GROUP BY clause creates logical groups of records that have the same values for the Group Order specified fields before computing the aggregate functions. Having GROUP BY c.country; first_name last_name email... country "Deborah" "Ruiz" "[email protected]"... "Singapore" "Tammy" "Lee" "[email protected]"... "Singapore"... "Raymon" "Tan" "[email protected]"... "Thailand" "Jean" "Ling" "[email protected]"... "Thailand"... IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 6 / 34 Aggregation Functions Grouping ❱ Grouping Logical Aggregation Aggregation Where From Aggregation Function Per Group Select Renaming The aggregation functions are calculated for each logical group. Group Order Having SELECT c.country, COUNT(*) SELECT c.country, COUNT(*) FROM customers c FROM customers c; GROUP BY c.country; country count This is actually an error. "Vietnam" 98 "Singapore" 391... IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 7 / 34 Aggregation Functions Grouping ❱ Grouping Logical Where Aggregation Where From After WHERE Select Renaming Groups are formed (logically) after the rows have been filtered by the WHERE clause. Group Order Having SELECT c.country, COUNT(*) country count FROM customers c "Vietnam" 4 WHERE c.dob >= '2006-01-01' "Singapore" 25 GROUP BY c.country; "Thailand" 5 "Indonesia" 15 "Malaysia" 12 IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 8 / 34 Aggregation Functions Grouping ❱ Grouping Logical From Aggregation Where From After FROM Select Renaming Groups are formed (logically) after the tables have been joined in the FROM clause. Group Order Having SELECT c.customerid, c.first_name, c.last_name, SUM(g.price) FROM customers c, downloads d, games g WHERE c.customerid = d.customerid AND d.name = g.name and d.version = g.version GROUP BY c.customerid, c.first_name, c.last_name; Find the total spending of each customer. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 9 / 34 Aggregation Functions Grouping ❱ Grouping Logical Select Aggregation Where From SELECT Clause Select Renaming It is recommended (and required per SQL standard) to include attributes projected in the SELECT Group Order clause by the GROUP BY clause. Having SELECT c.customerid, c.first_name, c.last_name, SUM(g.price) FROM customers c, downloads d, games g WHERE c.customerid = d.customerid AND d.name = g.name and d.version = g.version GROUP BY c.customerid; The above query works only because first_name and last_name are guaranteed unique. Do NOT write such queries for readability and portability. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 10 / 34 Aggregation Functions Grouping ❱ Grouping Logical Select Aggregation Where From Invalid Query Select Renaming The following query does not work in PostgreSQL (but works in SQLite with potentially incorrect Group Order result). Having SELECT c.customerid, c.first_name, c.last_name, SUM(g.price) FROM customers c, downloads d, games g WHERE c.customerid = d.customerid AND d.name = g.name and d.version = g.version GROUP BY c.first_name, c.last_name; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 11 / 34 Aggregation Functions Grouping ❱ Grouping Logical Renaming Aggregation Where From Renamed Column Select Renaming Renamed columns can be used in GROUP BY clause. The following query displays the number Group Order of downloads by country and year of birth (using EXTRACT). Having SELECT c.country, EXTRACT(YEAR FROM c.since) AS regyear, COUNT(*) AS total FROM customers c, downloads d WHERE c.customerid = d.customerid GROUP BY c.country, regyear ORDER BY regyear ASC, c.country ASC; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 12 / 34 Aggregation Functions Grouping ❱ Grouping Logical Group Order Aggregation Where From GROUP BY Reordering Select Renaming The order of columns in GROUP BY clause does not change the meaning of the query. The Group Order logical groups remain the same. Having SELECT c.country, EXTRACT(YEAR FROM c.since) AS regyear, COUNT(*) AS total FROM customers c, downloads d WHERE c.customerid = d.customerid GROUP BY regyear, c.country ORDER BY regyear ASC, c.country ASC; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 13 / 34 Aggregation Functions Having Grouping ❱ Having Condition Condition Aggregate Condition Aggregate functions can be used in conditions, but not in WHERE clause. Aggregate functions can be evaluated after groups are formed (which is after WHERE clause). SELECT c.country HAVING Clause FROM customers c We need a new clause: HAVING clause. This clause is WHERE COUNT(*) >= 100 performed after GROUP BY clause. GROUP BY c.country; HAVING clause can only use aggregate functions, columns listed in the GROUP BY clause, and subqueries. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 14 / 34 Aggregation Functions Having Grouping ❱ Having Condition Condition Aggregate Condition Aggregate functions can be used in conditions, but not in WHERE clause. Aggregate functions can be evaluated after groups are formed (which is after WHERE clause). SELECT c.country Note FROM customers c The query on the left finds the countries in which there GROUP BY c.country are more than 100 customers. HAVING COUNT(*) >= 100; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 15 / 34 Joins IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 16 / 34 Joins ❱ Inner Join Basic Inner Join JOIN Basic Natural Join Outer Join Expressiveness While inner join is a popular construct, there is no added expressiveness or performance in INNER JOIN. The two queries below are equivalent. Inner Join Cross Join SELECT * SELECT * FROM customers c FROM customers c, downloads d, INNER JOIN downloads d games g ON d.customerid = c.customerid WHERE d.customerid = c.customerid INNER JOIN games g AND d.name = g.name ON d.name = g.name AND d.version = g.version; AND d.version = g.version; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 17 / 34 Joins ❱ Inner Join Basic Inner Join JOIN JOIN Natural Join Outer Join Synonym JOIN is synonymous with INNER JOIN. We do not recommend either as CROSS JOIN (or comma) is typically easier to read and will be optimized by DBMS. Join SELECT * FROM customers c JOIN downloads d ON d.customerid = c.customerid JOIN games g ON d.name = g.name AND d.version = g.version; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 18 / 34 Joins Inner Join Natural Join ❱ Natural Join Outer Join What is Natural? Automatic Equality If we managed to give the same name to columns with the same meaning across the tables, we can use natural join. NATURAL JOIN joins rows that have the same values for columns with the same name. It also prints only one of the two columns. Natural Join SELECT * Question FROM customers c Can you write the equivalent of the query NATURAL JOIN downloads d on the left using CROSS JOIN? NATURAL JOIN games g; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 19 / 34 Joins Inner Join Outer Join Natural Join ❱ Outer Join Basic Basic Example What is Outer? Anti Join Closing The outer join keeps the columns of the rows in the left (left outer join), the right (right outer join), or in both (full outer join) tables that do not match anything in the other table according to the join condition. The remaining values are padded with NULL values. Warning It is better to avoid outer joins whenever possible as they introduce NULL values. They can sometimes be justified for efficiency reasons. However, this course does not care about efficiency as long as the query can finish within reasonable time. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 20 / 34 Joins Inner Join Outer Join Natural Join ❱ Outer Join Example Basic Example Left Outer Join Anti Join Closing In the example below, the customers --from the left table-- who never downloaded a game are combined with NULL values to replace missing values for the columns of the downloads table. Columns from the right table are padded with NULL values. SELECT c.customerid, c.email, d.customerid, d.name, d.version FROM customers c LEFT OUTER JOIN downloads d ON c.customerid = d.customerid; c.customerid c.email d.customerid d.name d.version... "Willie90" "[email protected]" "Willie90" "Ronstring" "1.1" "Willie90" "[email protected]" "Willie90" "Veribet" "2.1" "Al8" "[email protected]" null null null "Johnny1997" "[email protected]" null null null IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 21 / 34 Joins Inner Join Outer Join Natural Join ❱ Outer Join Example Basic Example Right Outer Join Anti Join Closing In the example below, the games --from the right table-- that have never been downloade are combined with NULL values to replace missing values for the columns of the downloads table. Columns from the left table are padded with NULL values. SELECT * FROM downloads d RIGHT OUTER JOIN games g ON g.name = d.name AND g.version = d.version; Full Outer Join A full outer join pads missing values with NULL for both the tables on the left and on the right. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 22 / 34 Joins Inner Join Outer Join Natural Join ❱ Outer Join Anti Join Basic Example Only Missing Value Further Restriction Anti Join Closing Find customers who never downloaded a Further restriction should be on WHERE game. clause and not ON clause. SELECT c.customerid SELECT c.customerid FROM customers c FROM customers c LEFT OUTER JOIN downloads d LEFT OUTER JOIN downloads d ON c.customerid = d.customerid ON c.customerid = d.customerid WHERE d.customerid IS NULL; WHERE d.customerid IS NULL AND c.country = 'Singapore'; -- try moving the AND above IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 23 / 34 Joins Inner Join Outer Join Natural Join ❱ Outer Join Closing Basic Example Warning Anti Join Closing Outer joins are not easy to write as conditions in the ON clause are not equivalent to conditions in the WHERE clause (as it was the case with INNER JOIN). Conditions in the ON clause determines which rows are dangling. Synonyms LEFT JOIN is synonym for LEFT OUTER JOIN RIGHT JOIN is synonym for RIGHT OUTER JOIN FULL JOIN is synonym for FULL OUTER JOIN IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 24 / 34 Set Operations IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 25 / 34 Set Operations ❱ Set Basic Set Compatible Basic Examples Operators The set operators UNION, INTERSECT, and EXCEPT return the union, intersection, and non- symmetric difference of the results of two queries respectively. Deduplication Union, intersection, and non-symmetric difference eliminate duplicates unless annotated with the keyword ALL. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 26 / 34 Set Operations ❱ Set Basic Set Compatible Compatible Examples Union-Compatible Two queries must be union-compatible to be used with UNION, INTERSECT, or EXCEPT. They must return the same number of columns with the same domain in the same order. IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 27 / 34 Set Operations Set Examples ❱ Examples Union Union Intersection Difference Question Find the customerid of the customers who downloaded version 1.0 or 2.0 of the game Aerified. SELECT d.customerid FROM downloads d WHERE d.name = 'Aerified' AND d.version = '1.0' UNION SELECT d.customerid FROM downloads d WHERE d.name = 'Aerified' AND d.version = '2.0'; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 28 / 34 Set Operations Set Examples ❱ Examples Union Union Intersection Difference Question Find the name and versions of the games after GST is applied. GST of 9% is applied if it is more than 30 cents. SELECT g.name || ' ' || g.version AS game, ROUND(g.price * 1.09) AS price FROM games g WHERE g.price * 0.09 >= 0.3 UNION SELECT g.name || ' ' || g.version AS game, g.price FROM games g WHERE g.price * 0.09 < 0.3; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 29 / 34 Set Operations Set Examples ❱ Examples Union Intersection Intersection Difference Question Find the customerid of the customers who downloaded both version 1.0 and 2.0 of the game Aerified. SELECT d.customerid FROM downloads d WHERE d.name = 'Aerified' AND d.version = '1.0' INTERSECT SELECT d.customerid FROM downloads d WHERE d.name = 'Aerified' AND d.version = '2.0'; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 30 / 34 Set Operations Set Examples ❱ Examples Union Difference Intersection Difference Question Find the customerid of the customers who downloaded version 1.0 but not version 2.0 of the game Aerified. SELECT d.customerid FROM downloads d WHERE d.name = 'Aerified' AND d.version = '1.0' EXCEPT SELECT d.customerid FROM downloads d WHERE d.name = 'Aerified' AND d.version = '2.0'; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 31 / 34 Conclusion IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 32 / 34 Conclusion ❱ Reading Reading What Does This Query Find? SELECT c.email, SUM(g.price) FROM customers c, downloads d, games g WHERE c.customerid = d.customerid AND g.name = d.name AND g.version = d.version AND c.country = 'Indonesia' AND g.name= 'Fixflex' GROUP BY c.email UNION SELECT c.email, 0 FROM customers c LEFT JOIN (downloads d INNER JOIN games g ON g.name = d.name AND g.version = d.version AND g.name = 'Fixflex') ON c.customerid = d.customerid WHERE c.country = 'Indonesia' AND d.name IS NULL; IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan) 33 / 34 postgres=# exit Press any key to continue... IT5008: Database Design & Programming -- Adi Yoga Sidi Prabawa (notes adapted from Prof. Stephane Bressan)

Use Quizgecko on...
Browser
Browser