Databases Fundamentals: Chapter 2 Retrieving Data in SQL PDF

Summary

This document details chapter 2, titled "Retrieving data in SQL", of a course on Databases Fundamentals. The document describes SQL syntax and provides examples of retrieving data from a relational database. The document includes tables and diagrams.

Full Transcript

Semester 1 Applied Computer Science & Cyber Security Databases Fundamentals Chapter 2 Retrieving data in SQL Kristien Roels Howest Bruges | Rijselstraat 5 - 820...

Semester 1 Applied Computer Science & Cyber Security Databases Fundamentals Chapter 2 Retrieving data in SQL Kristien Roels Howest Bruges | Rijselstraat 5 - 8200 Bruges T 050 38 12 77 | www.howest.be Content 2 Retrieving data in SQL.......................................................................................................... 1 2.1 The sample database company...................................................................................... 1 2.2 Syntax SELECT statement............................................................................................. 2 2.3 The minimum SELECT................................................................................................... 2 2.4 The ORDER BY clause................................................................................................... 3 2.5 The WHERE clause........................................................................................................ 4 2.6 The inner join of tables.................................................................................................... 6 2.7 The outer join of tables................................................................................................... 8 2.8 Scalar functions............................................................................................................ 10 2.9 Aggregate functions...................................................................................................... 12 2.10 The GROUP BY and HAVING clause........................................................................ 13 2.11 Subqueries................................................................................................................ 15 2.11.1 Non-correlated subqueries.................................................................................. 15 2.11.2 Correlated subqueries......................................................................................... 18 2.12 The EXISTS operator................................................................................................ 20 2.13 Subqueries versus joins............................................................................................. 21 2.14 SELECT INTO........................................................................................................... 21 2 Retrieving data in SQL This chapter deals with querying data from a relational database using SQL. The SQL statement used for this purpose is the SELECT statement. This is a very powerful statement that solves both simple and complex queries about the data in the database. 2.1 The sample database company The sample database company contains data of employees in a company. There are four tables: Table contains offices the branches of the company jobs the jobs that exist in the company (director, programmer,...) employees the employees replacements who can be replaced by whom (in case of holidays, illness,... ) The structure: offices (officename, sector, city) officename is the PK jobs (jobname, minimumsalary, maximumsalary) jobname is the PK employees (employeeno, empname, birthdate, division, jobname, salary, officename) employeeno is the PK officename is an FK referring to the PK officename in offices jobname is a FK that refers to the PK jobname in jobs replacements (employeeno, replacementno) employeeno, replacementno is the composite PK employeeno is a FK referring to the PK employeeno in employees replacementno is a FK referring to the PK employeeno in employees Important constraints: Primary Key (PK) and Foreign Key (FK). - A Primary Key (PK) uniquely identifies a row. There cannot be 2 rows with the same value for the PK. - A Foreign Key (FK) or foreign key must have a value that appears in the PK column referenced or must be NULL (empty). 1 A diagram representation of the structure of the database company: 2.2 Syntax SELECT statement Order of execution and operation SELECT clause 5. specifies the column(s) FROM clause 1. specifies the table(s) [WHERE clause ] 2. filters rows [GROUP BY clause ] 3. groups rows [ HAVING clause ] 4. filters groups [ ORDER BY clause ] 6. sorts the rows in the result set The SELECT statement consists of a number of clauses (components). Not all clauses are mandatory to include: optional clauses are enclosed in square brackets [ ]. The order of the clauses should be as in the syntax description above. For example, the WHERE clause should come after the FROM clause and before the ORDER BY clause. The result of a SELECT statement is a table. It is referred to as the result set. 2.3 The minimum SELECT The simplest form of the SELECT statement consists of a SELECT clause and a FROM clause. This form of the SELECT statement has the following syntax: SELECT [ DISTINCT ] column list FROM table The result set of this simplest SELECT statement contains all the rows from the table mentioned in the FROM clause. The column list lists the columns (separated by a comma) that should appear in the result set. It is also possible to write an asterisk *: then all columns from the table will be displayed. DISTINCT eliminates duplicate rows in the result set. 2 Examples (company): 1. Give all details of all employees. SELECT * FROM employees 2. Give the name and the salary of all employees. SELECT employeename, salary FROM employees 3. Give the jobs filled by employees. SELECT DISTINCT jobname FROM employees 4. For each division, give the jobs filled by employees. SELECT DISTINCT division, jobname FROM employees 5. For each job, give: maximumsalary, minimumsalary and the difference between the two. SELECT jobname, minimumsalary, maximumsalary, maximumsalary - minimumsalary AS diff FROM jobs 2.4 The ORDER BY clause The ORDER BY clause sorts the rows in the result set. This clause has the following syntax: ORDER BY column name [ ASC | DESC ] [ ,... ] The column name indicates which column is being sorted by. Multiple columns [ ,... ] can be sorted; these columns should not necessarily be in the SELECT clause. ASC (ascending) means ascending sorting and is the default sort; DESC (descending) means descending sorting. Examples (company): 1. Give name, job and salary of all employees. Sort the list alphabetically by name. SELECT employeename, jobname, salary FROM employees ORDER BY employeename --ASC 2. Give all details of all branches (offices). Sort the list descending by city and within city descending by officename. SELECT * FROM offices ORDER BY city DESC, officename DESC 3 To display only the first n rows in the result set, the SELECT clause can be extended with TOP (n). TOP (n) always works in conjunction with ORDER BY. The syntax is as follows: SELECT TOP (n) [ WITH TIES ] column list FROM clause ORDER BY clause TOP (n) indicates that the first n rows should be displayed (according to the sort order). The last place can be taken by different rows with WITH TIES if those rows have the same value in the columns used in the ORDER BY clause. Examples (company): 3. Give the alphabetically first 5 employees. SELECT TOP(5) * FROM employees ORDER BY employeename 4. Give the employee(s) with the highest salary. SELECT TOP(1) WITH TIES * FROM employees ORDER BY salary DESC 2.5 The WHERE clause The WHERE clause filters rows: only the rows that meet the condition are displayed. The syntax of this clause: WHERE condition The condition can take many forms: expression { = | < | ! = | > | > = | ! > | < | < = | ! < } expression expression IS [ NOT ] NULL expression [ NOT ] BETWEEN expression AND expression string expression [ NOT ] LIKE pattern expression [ NOT ] IN (expression [ ,... ] ) An expression can be a column name, a constant or a "real" expression. In a "real" expression, a calculation is performed or a scalar function is used (see below). It is also possible to create a composite condition. Here, 3 logical operators can be used: AND, OR, NOT. If these logical operators AND, OR and NOT are used together in a composite condition, it is important to know the priority of these operators: - priority 1 (highest priority): NOT - priority 2: AND - priority 3 (lowest priority): OR Of course, parentheses can be used to indicate the order in which the logical operators NOT, AND and OR should be executed. [ NOT ] condition1 { AND | OR } [ NOT ] condition2 4 Examples (company): 1. Give the branches (offices) in Brussels. SELECT * FROM offices WHERE city = 'Brussels Note that comparing text is case-insensitive. 2. Give the employees working at Computerland who are not an Analyst. Sort alphabetically by name. SELECT * FROM employees WHERE officename = 'Computerland' AND jobname != 'Analyst' ORDER BY employeename 3. Give the employees with a salary between 80000 and 120000. SELECT * FROM employees WHERE salary >= 80000 AND salary '1980-01-05' ORDER BY birthdate 9. Give the employees with an even number. SELECT * FROM employees WHERE employeeno % 2 = 0 2.6 The inner join of tables Combining data from different tables is done with join operations. In an inner join of two tables, the result set contains 'combined' rows. Such a 'combined' row is a row of table1 next to a row of table2 if the specified join condition is met. Usually, the join is done between 2 tables where the foreign key of one table must match the primary key of the other table. The syntax of the SELECT statement then looks as follows: SELECT clause FROM table1 JOIN table2 ON join-condition1 [ JOIN table3 ON join-condition2 ] [... ] [ WHERE clause ] [ GROUP BY clause ] [ HAVING clause ] [ ORDER BY clause ] Examples (company): 1. Give the employees with their branch details (offices). SELECT * FROM employees JOIN offices ON employees.officename = offices.officename The column officename appears in both tables (employees and offices). In the join condition, this column name must then be preceded by the table name to distinguish between the 2 columns. It is referred to as qualifying a column. 6 SELECT employees.*, offices.sector, offices.city FROM employees JOIN offices ON employees.officename = offices.officename SELECT e.*, o.sector, o.city FROM employees e JOIN offices o ON e.officename = o.officename The above example uses an alias for the table names. This makes writing the query a little easier (shorter). 2. Give alphabetically the names of the employees working in Brussels. SELECT employeename -- also possible: employees.employeename FROM employees JOIN offices ON employees.officename = offices.officename WHERE city = 'Brussels' -- also possible: offices.city ORDER BY employeename 3. For each employee, give the salary and themaximum salary of his job. SELECT employeeno, employeename, salary, maximumsalary FROM employees JOIN jobs ON employees.jobname = jobs.jobname ORDER BY employeeno 4. Give all employees who are Analysts or Programmers who are underpaid or overpaid (i.e. earning less than the minimum salary or more than the maximum salary for their job). SELECT * FROM employees JOIN jobs ON employees.jobname = jobs.jobname WHERE (employees.jobname = 'Analyst' OR employees.jobname = 'Programmer') AND (employees.salary < jobs.minimumsalary OR employees.salary > jobs.maximumsalary) Note the use of parentheses! When these logical operators AND and OR are used together in a composite condition, it is important to know the priority of these operators: AND has a higher priority than OR. SELECT * FROM employees JOIN jobs ON employees.jobname = jobs.jobname WHERE employees.jobname IN ('Analyst', 'Programmer') AND employees.salary NOT BETWEEN jobs.minimumsalary AND jobs.maximumsalary 5. For each employee who can be replaced, give: the number, name and city where he works. SELECT DISTINCT employees.employeeno, employeename, city FROM employees JOIN replacements ON employees.employeeno = replacements.employeeno JOIN offices ON employees.officename = offices.officename This is a join with 3 tables. Note here the use of DISTINCT. Indeed, an employee that can be replaced by multiple employees should appear only once in the result set. 7 6. Who earns more than the Director of his branch (office)? SELECT * FROM employees e1 JOIN employees e2 ON e1.officename = e2.officename WHERE e2.jobname = 'Director' AND e1.salary > e2.salary This is a join of the table with itself. Then you have to work with an alias for the table names to distinguish between the 2 tables. 7. Give the numbers of the employees who can replace each other. SELECT r1.employeeno, r1.replacementno FROM replacements r1 JOIN replacements r2 ON r1.employeeno = r2.replacementno AND r1.replacementno = r2.employeeno WHERE r1.employeeno < r2.employeeno -- 4 5 yes; 5 4 no 2.7 The outer join of tables There are 3 types of outer joins. Outer joins extend the inner join with rows from the left and/or right table. A left (outer) join contains the rows of the inner join as well as the rows from the left table that do not appear in the inner join. These rows are combined with NULL values on the right-hand side. A right (outer) join contains the rows of the inner join as well as the rows from the right table that do not appear in the inner join. These rows are combined with NULL values on the left-hand side. A full (outer) join contains the rows of the inner join as well as the rows from the left-hand table that do not occur in the inner join (these rows are combined with NULL values on the right-hand side) and the rows from the right-hand table that do not occur in the inner join (these rows are combined with NULL values on the left-hand side). A full (outer) join is not often used and is not discussed further in this course. The syntax of the SELECT statement then looks as follows: SELECT clause FROM table1 LEFT JOIN table2 ON join-condition1 [ LEFT JOIN table3 ON join-condition2 ] [... ] [ WHERE clause ] [ GROUP BY clause ] [ HAVING clause ] [ ORDER BY clause ] 8 SELECT clause FROM table1 RIGHT JOIN table2 ON join-condition1 [ RIGHT JOIN table3 ON join-condition2 ] [... ] [ WHERE clause ] [ GROUP BY clause ] [ HAVING clause ] [ ORDER BY clause ] SELECT clause FROM table1 FULL JOIN table2 ON join-condition1 [ FULL JOIN table3 ON join-condition2 ] [... ] [WHERE clause] [ GROUP BY clause ] [ HAVING clause ] [ ORDER BY clause] Examples (company): 1. Give all the employees with their replacer(s), if any. If an employee has no replacer, he is still listed with NULL for the replacer. SELECT employees.*, replacements.* FROM employees LEFT JOIN replacements ON employees.employeeno = replacements.employeeno SELECT employees.*, replacements.* FROM replacements RIGHT JOIN employees ON employees.employeeno = replacements.employeeno 2. Give all offices with the employees. If an office has no employees (yet), this should also be reflected in the result set. SELECT * FROM offices o LEFT JOIN employees e ON o.officename = e.officename ORDER BY o.officename, e.employeeno 3. For each job, give the employees performing that job. If a job is not (yet) filled by an employee, this should also be shown in the result set. SELECT * FROM jobs LEFT JOIN employees ON jobs.jobname = employees.jobname ORDER BY jobs.jobname, employees.employeeno 9 2.8 Scalar functions Scalar functions operate on 1 value and deliver 1 value. This section discusses some interesting scalar functions. A scalar function can be used where an expression is expected. There are a lot of scalar functions: for a detailed description of all scalar functions, categorised, see https://learn.microsoft.com/en-us/sql/t-sql/functions/functions?view=sql-server-ver16. In the Management Studio, it is possible to enter a function name in the query editor, place the cursor on the function name and then press F1 to get help about this function. Note that each RDBMS has its own scalar functions. For example, MS SQL Server's scalar functions are different from MySQL's scalar functions. Finally, also note that it is possible to define functions yourself: people talk about user-defined functions (UDF). Some common date and time functions: YEAR(date) Returns the year from the date as an integer MONTH(date) Returns the month from the date (1-12) DAY(date) Returns the day from the date (1-31) GETDATE() Returns the current server date and time (to millisecond level) DATEPART(datepart, date) Returns an integer representing the requested part of the date-time DATEADD(datepart, number, date) Adds an interval to the specified date-time DATEDIFF(datepart, startdate, Returns the difference between 2 date-times. Only the enddate) requested part is considered, not the full date-time. Some common string functions: REPLACE(source string, search string, Replaces in the source string all occurrences of the replace string) search string with the replace string CHARINDEX(search string, string) Returns the position where the search string is first found in the string SUBSTRING(string, start position, Returns a partial string length) RIGHT(string, length) Returns the right part of the string LEFT(string, length) Returns the left part of the string CONCAT(string1, string2,...) Concatenation (merging) of strings UPPER(string) Returns the string in capitals LOWER(string) Returns the string in lowercase LTRIM(string) Returns the string without leading spaces RTRIM(string) Returns the string without trailing spaces Some common numeric functions: ROUND(numeric_expression , length) Rounds the number mathematically FLOOR(numeric_expression) Round the number down CEILING(numeric_expression) Round up the number 10 Examples (company): 1. Give the employees born in 1980 or later. Sort by date of birth. SELECT * FROM employees WHERE YEAR(birthdate) >= 1980 ORDER BY birthdate 2. For all employees, give their age. Only consider the year when calculating age. SELECT *, DATEDIFF(YEAR, birthdate, GETDATE()) AS age FROM employees SELECT *, YEAR(GETDATE()) - YEAR(birthdate) AS age FROM employees 3. Give the employees working in a division whose name begins with the letter A. SELECT * FROM employees WHERE LEFT(division,1) = 'A' SELECT * FROM employees WHERE SUBSTRING(division, 1, 1) = 'A' SELECT * FROM employees WHERE division LIKE 'A%' 4. Sort the employees alphabetically by name. Blanks should not play a role in determining the sorting order. SELECT * FROM employees ORDER BY REPLACE(employeename, ' ', '') 5. Give all the employees with their branch details and with a column containing their employee code. The employee code is formed by the employeeno, '-', first 3 characters from the employeename, '-', first 3 characters from the city. SELECT *, CONCAT(employeeno, '-', LEFT(employeename,3), '-', LEFT(city,3)) AS employeecode FROM employees JOIN offices ON employees.officename = offices.officename ORDER BY employeeno 6. Give all the employees and give with each employee 4 columns (calculations) in the result set: increasing the salary by 99.45; rounding this new salary, mathematically, up and down, respectively. SELECT *, salary + 99.45 AS newSalary, ROUND(salary + 99.45, 0) AS rounded, FLOOR(salary + 99.45) AS floored, CEILING(salary + 49.95) AS ceiled FROM employees 11 2.9 Aggregate functions Aggregate functions operate on a set of values and return 1 value. This section discusses some interesting aggregate functions. Count(*) Counts the number of rows Count(expression) Counts the number of non-NULL values in the expression Count(distinct expression) Counts the number of different non-NULL values in the expression Min(expression) Returns the smallest non-NULL Numeric, string, value from the expression datetime datatype Max(expression) Returns the largest non-NULL Numeric, string, value from the expression datetime datatype Sum([distinct] expression) Returns the sum of the [different] Numeric data type non-NULL values from the expression Avg([distinct] expression) Returns the average of the Numeric data type [different] non-NULL values from the expression The expression is almost always a column name. Aggregate functions can NOT appear in the WHERE clause! Indeed, the WHERE clause works row by row and the aggregate functions work on an range of values. Examples (company): 1. How many employees are there? SELECT COUNT(*) AS number FROM employees SELECT COUNT(employeeno) AS number FROM employees --The query below is wrong because the column salary can contain NULL values. SELECT COUNT(salary) AS number FROM employees --The query below is wrong because an aggregate function cannot be combined with a range of values. SELECT employeeno, COUNT(*) AS number FROM employees 2. How many employees work at Computerland? SELECT COUNT(*) AS number FROM employees WHERE officename = 'Computerland' 3. How many employees' salaries were filled in? SELECT COUNT(salary) AS number FROM employees SELECT COUNT(*) AS number FROM employees 12 WHERE salary IS NOT NULL 4. How many branches are there and in how many different cities is there a branch? SELECT COUNT(*) AS counterOffices, COUNT(DISTINCT city) AS counterCities FROM offices 5. Give from the table employees: the smallest salary, the largest salary and the difference between the two. SELECT MIN(salary) AS minimum, MAX(salary) AS maximum, MAX(salary) - MIN(salary) AS difference FROM employees 6. What is the total salary amount of employees working in Brussels? SELECT SUM(salary) AS total FROM employees JOIN offices ON employees.officename = offices.officename WHERE city = 'Brussels 7. Give the average salary of the employees. SELECT AVG(salary) AS average FROM employees SELECT SUM(salary) / COUNT(salary) AS average FROM employees --The query below is incorrect because the column salary can contain NULL values. SELECT SUM(salary) / COUNT(*) AS average FROM employees 2.10 The GROUP BY and HAVING clause The GROUP BY clause creates groups of rows, usually to apply an aggregate function to each group. The SELECT statement returns 1 result row per group. This result row usually contains the column(s) that were grouped and aggregate functions acting on the group. Groups can be filtered with the HAVING clause. Only groups meeting the condition are retained. The condition can include: the column(s) that were grouped on and aggregate functions. Note that the HAVING clause cannot be used without a prior GROUP BY clause. The syntax of the SELECT statement then looks as follows: SELECT clause Specifies the column(s) FROM clause 1. specifies the table(s) [WHERE clause ] 2. filters rows [GROUP BY clause ] 3. groups rows [ HAVING clause ] 4. filters groups [ ORDER BY clause ] 6. sorts the rows in the result set The syntax of the GROUP BY and HAVING clauses is simple: GROUP BY column name [ ,... ] [ HAVING condition ] 13 Examples (company): 1. Give the number of employees for each branch (office). SELECT o.officename, COUNT(*) AS number FROM offices o JOIN employees e ON o.officename = e.officename GROUP BY o.officename --The query below is wrong because the SELECT clause can only contain group information and not detail information of rows from the group: SELECT *, COUNT(*) AS number FROM offices o JOIN employees e ON o.officename = e.officename GROUP BY o.officename 2. For each branch, give the number of employees, the date of birth of the youngest employee and the date of birth of the oldest employee. SELECT o.officename, COUNT(*) AS number, MAX(birthdate) AS youngest, MIN(birthdate) AS oldest FROM offices o JOIN employees e ON o.officename = e.officename GROUP BY o.officename 3. For each branch, give the number of employees. Ensure that only branches with more than 4 employees appear in the list. SELECT o.officename, COUNT(*) AS number FROM offices o JOIN employees e ON o.officename = e.officename GROUP BY o.officename HAVING COUNT(employeeno) > 4 4. For each branch, give the number of employees. Ensure that branches without employees also appear in the list. SELECT o.officename, COUNT(employeeno) AS number FROM offices o LEFT JOIN employees e ON o.officename = e.officename GROUP BY o.officename Note: use COUNT(employeeno) and not COUNT(*). Indeed, per group, the number of non-NULL values in the employeeno column should be counted, not the number of rows. 5. For each month, give the number of employees who have a birthday. SELECT MONTH(birthdate) AS monthBirthdate, COUNT(*) as number FROM employees WHERE birthdate IS NOT NULL GROUP BY MONTH(birthdate) 6. Give the jobs in which at least 2 employees are overpaid. SELECT employees.jobname, COUNT(*) AS number FROM employees JOIN jobs ON employees.jobname = jobs.jobname WHERE salary > maximumsalary GROUP BY employees.jobname HAVING COUNT(*) >= 2 14 2.11 Subqueries A subquery is a query within a query. In other words, a SELECT statement within another SELECT statement. A subquery must always be enclosed in round brackets and can be used in the following places within a SELECT statement:... WHERE expression comparator (subquery) The subquery returns 1 value... WHERE expression [ NOT ] IN (subquery) The subquery returns 1 value or a list of values... WHERE expression vgloperator The subquery returns 1 value or a ANY | SOME | ALL (subquery) list of values... WHERE [ NOT ] EXISTS (subquery) It checks whether the subquery returns [no] result rows SELECT (subquery)... The subquery returns 1 value... FROM (subquery) The subquery returns a result set Many SELECT statements with a subquery can also be written with a join. However, sometimes the use of subqueries is necessary to solve the query being asked. The advantage of subqueries is that they break down a complex problem into smaller pieces, making the query easier to read. There are 2 types of subqueries: non-correlated subqueries and correlated subqueries. Both types are discussed below. 2.11.1 Non-correlated subqueries An uncorrelated subquery works independently of the main query (there is no reference from the subquery to the main query). The uncorrelated subquery is run first. The result is used in the main query. 2. Main query Uses the result of the subquery 1. Subquery Is executed first 15 Examples (company): 1. Give the employees with the highest salary. SELECT * FROM employees WHERE salary = (SELECT MAX(salary) FROM employees) SELECT TOP(1) WITH TIES * FROM employees ORDER BY salary DESC --The query below is wrong because an aggregate function cannot appear in the WHERE clause. SELECT * FROM employees WHERE salary = MAX(salary) 2. Give the employees earning more than the average salary. SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees) 3. Give the details of the employees who may be replaced. --Solution with a subquery SELECT * FROM employees WHERE employeeno IN (SELECT employeeno FROM replacements) --Solution with a join SELECT DISTINCT e.* FROM employees e JOIN replacements r ON e.employeeno = r.employeeno ORDER BY e.employeeno 4. Give the details of the employees who can NOT be replaced. --Solution with a subquery SELECT * FROM employees WHERE employeeno NOT IN (SELECT employeeno FROM replacements) --Solution with a left join SELECT e.* FROM employees e LEFT JOIN replacements r ON e.employeeno = r.employeeno WHERE r.employeeno IS NULL ORDER BY e.employeeno 5. Give the details of the employees working in a branch (office) within which the average salary is greater than 75000. SELECT * FROM employees WHERE officename IN (SELECT officename FROM employees 16 GROUP BY officename HAVING AVG(salary) > 75000) 6. Give the employees working in the sector Sales in a branch (office) without a Director. SELECT * FROM employees WHERE officename IN (SELECT officename FROM offices WHERE sector = 'Sales') AND officename NOT IN (SELECT officename FROM employees WHERE jobname = 'Director') SELECT employees.* FROM employees JOIN offices ON employees.officename = offices.officename WHERE sector = 'Sales' AND employees.officename NOT IN (SELECT officename FROM employees WHERE jobname = 'Director') 7. Give the 'ordinary' employees who earn more than a Director. SELECT * FROM employees WHERE jobname != 'Director' AND salary > ANY (SELECT salary FROM employees WHERE jobname = 'Director') --Note: SOME is an alias for ANY SELECT * FROM employees WHERE jobname != 'Director' AND salary > (SELECT MIN(salary) FROM employees WHERE jobname = 'Director') 8. Give the 'ordinary' employees who earn more than any 'Director'. SELECT * FROM employees WHERE jobname != 'Director' AND salary > ALL (SELECT salary FROM employees WHERE jobname = 'Director') SELECT * FROM employees WHERE jobname != 'Director' AND salary > (SELECT MAX(salary) FROM employees WHERE jobname = 'Director') 17 2.11.2 Correlated subqueries A correlated subquery depends on the main query (there is a reference from the subquery to the main query). For each row in the main query, the subquery is executed. 1. Main query For each row in the main query, the subquery is executed 2. Subquery contains a reference to the main query Examples (company): 1. Give the employees who earn the most in their branch (office). SELECT * FROM employees e1 WHERE salary = (SELECT MAX(salary) FROM employees e2 WHERE e2.officename = e1.officename) ORDER BY officename 2. Give the employees earning more than the average salary within their branch. SELECT * FROM employees e1 WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.officename = e1.officename) ORDER BY officename 3. Give the employees who can be replaced by 2 or more employees. --Solution with a correlated subquery SELECT * FROM employees WHERE (SELECT COUNT(*) FROM replacements WHERE employeeno = employees.employeeno) >= 2 --Solution with an non-correlated subquery SELECT * FROM employees WHERE employeeno IN (SELECT employeeno FROM replacements GROUP BY employeeno HAVING COUNT(*) >= 2) 18 4. Give all jobs with the number of employees for that job. SELECT *, (SELECT COUNT(*) FROM employees WHERE jobname = jobs.jobname) AS number FROM jobs SELECT jobs.jobname, COUNT(employeeno) AS number FROM jobs LEFT JOIN employees ON jobs.jobname = employees.jobname GROUP BY jobs.jobname 5. For each employee, give full details, along with his number of replacers. SELECT *, (SELECT COUNT(*) FROM replacements WHERE employeeno = employees.employeeno) AS number FROM employees 6. Give the employees who have a namesake in another branch. --Solution with a correlated subquery SELECT * FROM employees e1 WHERE employeename IN (SELECT employeename FROM employees e2 WHERE e2.officename != e1.officename) --Solution with a join SELECT e1.* FROM employees e1 JOIN employees e2 ON e1.employeename = e2.employeename WHERE e1.officename != e2.officename 7. Give the employees who can be replaced by an employee from another branch. --Solution with a correlated subquery SELECT * FROM employees e1 WHERE employeeno IN (SELECT employeeno FROM replacements WHERE replacementno IN (SELECT employeeno FROM employees e2 WHERE e2.officename != e1.officename) ) --Solution with a join SELECT DISTINCT e1.* FROM employees e1 JOIN replacements r ON r.employeeno = e1.employeeno JOIN employees e2 ON r.replacementno = e2.employeeno WHERE e1.officename != e2.officename 19 2.12 The EXISTS operator The EXISTS operator checks whether the result set of a correlated subquery is empty or not. General example 1: SELECT clause FROM clause WHERE EXISTS (subquery) Return the data (SELECT-FROM) for which the result set of the correlated subquery is not empty (exists). General example 2: SELECT clause FROM clause WHERE NOT EXISTS (subquery) Return the data (SELECT-FROM) for which the result set of the correlated subquery is empty (does not exist). The data itself from the result set of the correlated subquery is not important, but rather whether there is a result or not. So the correlated subquery usually starts with SELECT * Examples (company): 1. Give the employees who earn the least in their branch. SELECT * FROM employees e1 WHERE salary IS NOT NULL AND NOT EXISTS (SELECT * FROM employees e2 WHERE e2.officename = e1.officename AND e2.salary < e1.salary) 2. Give the branches where no employee can be replaced. SELECT * FROM offices WHERE NOT EXISTS (SELECT * FROM employees WHERE employees.officename = offices.officename AND employeeno IN (SELECT employeeno FROM replacements)) 3. Give the branches where all employees can be replaced. SELECT * FROM offices WHERE NOT EXISTS (SELECT * FROM employees WHERE employees.officename = offices.officename AND employeeno NOT IN (SELECT employeeno FROM replacements)) 20 2.13 Subqueries versus joins As seen, many SELECT statements with a subquery can also be written with a join. The use of subqueries is sometimes necessary to solve the question asked. The advantage of subqueries is that they break down a complex problem into smaller pieces, making the query easier to read. In general, joins are faster than subqueries because they can use indexes and other optimisation techniques. However, this is not always the case and performance may depend on the database engine, data size, query complexity and available indexes. 2.14 SELECT INTO The SELECT statement with the INTO clause creates a new table with the result set of the SELECT statement. The newly created table has columns with the same name and type as the list in the SELECT statement. The rows are the result rows of the SELECT statement. Note that constraints are not inherited, nor are indexes. If the WHERE clause of the SELECT statement returns FALSE, then a table is created with no data. The syntax: SELECT clause [ INTO table name ] FROM clause [WHERE clause] [GROUP BY clause] [HAVING clause] Example (company): Create a new table employeesafter1980 with the number, name, job and year of birth of employees born after 1980, along with the name and city of their branch. SELECT employeeno, employeename, jobname, YEAR(birthdate) AS birthyear, employees.officename, city INTO employeesafter1980 FROM employees JOIN offices ON employees.officename = offices.officename WHERE YEAR(birthdate) > 1980 SELECT * FROM employeesafter1980 Also look at the structure of the new table in the Object Explorer. 21