CS2102: Database Systems Lecture 7 PDF
Document Details
National University of Singapore
Tags
Summary
This document is a lecture on database systems, specifically focusing on programming with SQL. It covers various aspects of writing database applications, including non-interactive SQL, Statement Level Interface (SLI), and Call Level Interface (CLI).
Full Transcript
CS2102: Database Systems Lecture 7 — Programming with SQL Overview Writing Database Applications Motivation Statement Level Interface...
CS2102: Database Systems Lecture 7 — Programming with SQL Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control structures Cursors Summary 2 Using SQL So Far Interactive SQL: directly writing SQL statements to an interface Command line interface e.g., PostgreSQL's psql Graphical user interface e.g., PostgreSQL's pgAdmin https://www.postgresql.org/docs/current/static/app-psql.html https://www.pgadmin.org/ 3 What We Want: Writing Applications Non-interactive SQL SQL statements are included in an application written in a host language All operations can be executed (INSERT, UPDATE, DELETE, SELECT, etc.) 2 main alternatives Statement Level Interface (SLI) Call Level Interface (CLI) 4 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control structures Cursors Summary 5 Statement Level Interface (SLI) *.pgc Statement Level Interface (SLI) file Code is a mix of host language statements and SQL statements Examples: Embedded SQL, Dynamic SQL preprocessor SLI — basic process (here using C) *.c file 1) Write code that mixes host language with SQL (the normal C compiler will not understand this code!) compiler 2) Preprocess code using a preprocessor (understand SQL statements and converts them to pure C) *.exe 3) Compile code into an executable program file 6 SLI — Common Steps Source file: staticquery.pgc Declaration Declare variables Can be used by host language & SQL (other variables only usable by host language) Connection Connect to database with credentials Execution Static SQL = Prepare queries (in case of dynamic SQL) fixed query Execute queries (might require cursors) Operate on result Deallocation Release all resources If needed: commit any changes to db 7 SLI — Preprocessing, Compiling, Running Code Script file: compile-static-query *.pgc file preprocessor *.c file compiler *.exe file 8 SLI — Dynamic SQL Dynamic SQL SQL query is generated at runtime Example on the right: number of riders are specified as command line parameter 9 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control structures Cursors Summary 10 Call Level Interface (CLI) Call Level Interface (CLI) Application is completely written in host language ➜ no preprocessor needed SQL statements are strings passed as arguments to host language procedures or libraries Examples: ODBC (Open DataBase Connectivity), JDBC (Java DataBase Connectivity) Examples: libraries for Python pyodbc: connects to any DBMS with ODBC support psycopg: connects to PostgreSQL cx_Oracle: connects to Oracle MYSQLdb: connects to MySQL 11 CLI — Static SQL Example Declaration Connection Execution Deallocation 12 CLI — Dynamic SQL Example Declaration Connection Execution Deallocation 13 SLI vs. CLI — Discussion Crude simplification: SLI = CLI in disguise SLI preprocessor generates CLI code Source file: staticquery.c (generated by preprocessor) Import libraries Pass SQL statements as strings 14 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control structures Cursors Summary 15 SQL Injection Attack SQL Injection Attack Class of cyber attacks on dynamic SQL Goal of attack: execute unintended (typically malicious) SQL statements Typical cause: dynamic queries are generated by merging/concatenating strings Common attack point Omnipresent form fields in Web interfaces Entered values define some SQL statement 16 CLI Example Revisited Parameter value of intended range Generation of query by merging strings 17 SQL Injection Attack Demo Malicious parameter value by an attacker* This command would have thrown an error! Generation of query by merging strings * finding such values for a successful attack generally requires some insider knowledge and/or a lot of trial-&-error 18 SQL Injection Attacks — Key Takeaway Message Source file: psycopg docs 19 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control structures Cursors Summary 20 Motivation Application DBMS Very common in practice Tasks requiring multiple DB operations Get user (SELECT) Any combination of Reads and Writes User data or null Example: update user password Check that user does exist Check password (SELECT) Check that new password differs from old one True or False If all OK, update password Change password (UPDATE) ➜ 3 separate requests/accesses to DB that belong together to perform a task 21 Motivation What are the obvious problems? Application and DB may run on different machines ➜ poor performance / DB becomes bottleneck Different DB operations only loosely connected ➜ difficult to ensure "all or nothing" behavior What would we like to do? Stored functions and procedures Move (some) application logic into DB Collection of SQL statements and Group DB operations that form a task procedural logic together within an execution Precompiled and reusable code Treat task as a single DB operation Allow execute multiple database operations as a single unit 22 Motivation Why do we want/need procedural logic? Application logic that requires assignments, conditionals, or loops Queries that cannot be expressed using basic SQL id name points graduated name gap 1 Bob 94 TRUE Bob 0 2 Eve 82 FALSE Sue 0 Sort all students by points (descending) and 3 Sam 65 FALSE Leo 3 list the differences in points in the sequence 4 Liz 86 TRUE Tom 1 5 Tom 90 TRUE Liz 4 6 Sue 94 FALSE Ida 2 7 Zac 75 FALSE Zac 2 8 Ida 84 TRUE Question: Can we write a SQL query Eve 7 9 Leo 91 FALSE that returns the desired result? Pam 5 10 Pam 70 FALSE Sam 5 23 SQL Functions and Procedures SQL-based procedural language Oracle: PL/SQL ISO standard: SQL/PSM (SQL/Persistent Stored Modules) PostgreSQL: PL/pgSQL Unfortunately, different DBMS have their own "flavor" SQL Server: Transact-SQL Advantages (only if done right!) Disadvantages Better performance Testing & debugging more challenging Code reuse / higher productivity Limited portability / vendor lock-in Ease of maintenance No simple versioning of code Added security Not the most intuitive language 24 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control structures Cursors Summary 25 Stored Functions — Basic Syntax Function arguments Name of function (mode, name, type) CREATE [OR REPLACE] FUNCTION (, , …) RETURNS AS $$ Return type DECLARE -- Declaration of variables Body of function BEGIN (enclosed within dollar quotes; -- Sequence of SQL statement treated as a string) -- and/or procedural logic [RETURN …] END; $$ LANGUAGE ; Language used in body (mainly: plpgsql or sql) PostgreSQL docs with complete syntax 26 Running Example Database Toy database: Just 1 table Oversimplified table students CREATE TABLE students ( id name points graduated id SERIAL PRIMARY KEY, 1 Bob 94 TRUE name TEXT NOT NULL, 2 Eve 82 FALSE points INTEGER DEFAULT 0, 3 Sam 65 FALSE graduated BOOLEAN DEFAULT FALSE ); 4 Liz 86 TRUE 5 Tom 90 TRUE 6 Sue 94 FALSE 7 Zac 75 FALSE 8 Ida 84 TRUE 9 Leo 91 FALSE 10 Pam 70 FALSE 27 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control structures Cursors Summary 28 Stored Functions — Function Arguments Each argument is described by 3 values Mode: mode of argument (mainly: IN, OUT, INOUT) Name: name of argument (note: names are actually not mandatory!) same like in most programming languages Type: datatype of argument (INTEGER, VARCHAR, …, user-defined, etc.) IN OUT INOUT Default Explicitly specified Explicitly specified Value is passed to a function Value is returned by a function Value is passed to the function which returns another updated value Behaves like constants Behaves like an uninitialized variable Behaves like an initialized variable Value cannot be assigned Value must be assigned Value can/should be assigned 29 Simple Example — Add 2 Integers Quick Quiz: In which case is the right alternative more convenient? CREATE OR REPLACE FUNCTION add CREATE OR REPLACE FUNCTION add (IN a INT, IN b INT) (IN a INT, IN b INT, OUT sum INT) RETURNS INTEGER AS AS $$ $$ DECLARE BEGIN sum INT; sum := a + b; BEGIN END; sum := a + b; $$ RETURN sum; LANGUAGE plpgsql; END; $$ LANGUAGE plpgsql; SELECT add(2, 3); SELECT sum FROM add(2, 3); add sum 5 5 30 Quick Quiz CREATE OR REPLACE FUNCTION add (INT, INT) RETURNS INTEGER AS $$ DECLARE sum INTEGER; BEGIN sum := RETURN sum; How do we have to complete END; $$ the function to make it valid? LANGUAGE plpgsql; 31 Stored Functions — Language Explicit specification of language of function body Recall: function body is interpreted as a string Many supported languages: SQL, PL/pgSQL, PL/Python, PL/Java, PL/Perl, PL/TCL, C, etc. So, what to choose? built in Choice of language typically depends on task Focus on database operations: SQL, PL/pgSQL (good for code with lot of SQL queries due native support of SQL) additional installations Other languages more suitable for custom tasks (e.g., PL/Perl for string operations, C for high performance) 32 Stored Functions — sql vs plpgsql When to choose sql? Example: Remove all students who have graduated and return the number of remaining students. Body consists of only SQL statements (i.e., not procedural elements required to solve task) CREATE FUNCTION clean_students() RETURNS INTEGER AS Often a wrapper of single/few SQL statements $$ -- Delete all students Simpler syntax: no BEGIN…END -- Who have graduated DELETE FROM students WHERE graduated = TRUE; -- Return remaining student count When to choose plpgsql? -- If SELECT, no RETURN needed SELECT COUNT(*) AS num_students Procedural elements are required (duh!) FROM students; $$ Dynamic SQL: statements generated at runtime LANGUAGE sql; (attention: take care to avoid SQL injection attacks!) For trigger functions (see next lecture) Note: There are many other criteria (e.g., performance, optimization, reusability, transaction handling) that are beyond our scope here. 33 Stored Functions — Return Values (beyond simple values) One Existing Tuple CREATE FUNCTION get_top_student() RETURNS students AS $$ SELECT * Quick Quiz: Is this actually the FROM students ORDER BY points DESC right query for this task here? LIMIT 1; $$ LANGUAGE sql; SELECT id, name, points FROM get_top_student(); id name points 1 Bob 94 34 Stored Functions — Return Values (beyond simple values) One Existing Tuple Set of Existing Tuples CREATE FUNCTION get_top_student() CREATE FUNCTION get_enrolled_students() RETURNS students AS RETURNS SETOF students AS $$ $$ SELECT * SELECT * FROM students FROM students ORDER BY points DESC WHERE graduated = FALSE; LIMIT 1; $$ $$ LANGUAGE sql; LANGUAGE sql; id name 2 Eve SELECT id, name, points SELECT id, name 3 Sam FROM get_top_student(); FROM get_enrolled_students(); 6 Sue id name points 7 Zac 1 Bob 94 9 Leo 10 Pam 35 Quick Quiz CREATE FUNCTION get_top_student() RETURNS students AS $$ SELECT id, name, points FROM students ORDER BY points DESC LIMIT 1; $$ LANGUAGE sql; This will throw an error…why? 36 Stored Functions — Return Values (beyond simple values) One New Tuple CREATE FUNCTION get_top_student_count (OUT points INT, OUT cnt INT) RETURNS RECORD AS $$ SELECT points, COUNT(*) FROM students WHERE points = (SELECT MAX(points) FROM students) Important: If we use RECORD, we must GROUP BY points; have at least two OUT parameters! $$ LANGUAGE sql; SELECT points, cnt FROM get_top_student_count(); points cnt 94 2 37 Stored Functions — Return Values (beyond simple values) One New Tuple Set of New Tuples CREATE FUNCTION get_top_student_count CREATE FUNCTION get_group_counts (OUT points INT, OUT cnt INT) (OUT graduated BOOLEAN, OUT cnt INT) RETURNS RECORD AS RETURNS SETOF RECORD AS $$ $$ SELECT points, COUNT(*) SELECT graduated, COUNT(*) as cnt FROM students FROM students WHERE points = (SELECT MAX(points) GROUP BY graduated; FROM students) $$ GROUP BY points; LANGUAGE sql; $$ LANGUAGE sql; SELECT graduated, cnt FROM get_group_counts(); SELECT points, cnt FROM get_top_student_count(); graduated cnt FALSE 6 points cnt TRUE 4 94 2 38 Quick Quiz CREATE FUNCTION get_group_counts SELECT graduated, cnt (OUT graduated BOOLEAN, OUT cnt INT) FROM get_group_counts(); RETURNS SETOF RECORD AS $$ SELECT graduated, COUNT(*) as cnt FROM students GROUP BY graduated; $$ LANGUAGE sql; What will be the result of this query? 39 Stored Functions — Return Values (beyond simple values) Set of New Tuples CREATE FUNCTION get_group_counts() RETURNS TABLE(graduated BOOLEAN, cnt INT) AS $$ SELECT graduated, COUNT(*) FROM students CREATE FUNCTION get_group_counts GROUP BY graduated; (OUT graduated BOOLEAN, OUT cnt INT) $$ RETURNS SETOF RECORD AS … LANGUAGE sql; Equivalent effect…but always? SELECT graduated, cnt FROM get_group_counts(); CREATE FUNCTION get_group_counts() RETURNS TABLE(graduated BOOLEAN, cnt INT) graduated cnt AS … FALSE 6 TRUE 4 40 Quick Quiz CREATE FUNCTION name (…) CREATE FUNCTION name (…) vs. RETURNS SETOF RECORD AS … RETURNS TABLE(…) AS … What is a case where both alternatives are NOT interchangeable? 41 Stored Functions — Return Values (beyond simple values) No Return Value CREATE FUNCTION add_bonus (sid INT, amount INT) RETURNS VOID AS $$ Resulting table (not the function output!) UPDATE students SET points = points + amount id name points graduated WHERE id = sid; 1 Bob 94 TRUE $$ LANGUAGE sql; 2 Eve 82 FALSE 3 Sam 70 FALSE 4 Liz 86 TRUE 5 Tom 90 TRUE 6 Sue 94 FALSE SELECT add_bonus(3, 5); 7 Zac 75 FALSE add_bonus 8 Ida 84 TRUE 9 Leo 91 FALSE Actual output / result 10 Pam 70 FALSE of SELECT statement 42 Stored Procedures Stored procedures CREATE PROCEDURE add_bonus_proc(sid INT, amount INT) Essentially the same syntax as for functions AS $$ Most obvious difference: not RETURNS clause UPDATE students SET points = points + amount WHERE id = sid; Invoked using CALL command $$ LANGUAGE sql; CALL add_bonus(3, 5); No output / result, but Question: Wait, so is a procedure just a table gets updated function that does not return anything? 43 Stored Functions vs Stored Procedures (in PostgreSQL) Despite similar look-&-feel, functions and procedures do differ Functions must return something; procedures do not have to but still can (procedures can still return values by mean of INOUT and OUT parameters; the latter since Version 14) Procedures can commit or roll back transactions during its execution; functions can not Unlike functions, procedures cannot be invoked in DML commands (SELECT, INSERT, UPDATE, DELETE) Procedures a invoked in isolation using CALL (functions are always invoked in SELECT statements) CREATE PROCEDURE add_proc DO (IN a INT, IN b INT, OUT sum INT) $$ AS DECLARE $$ sum INT; BEGIN NOTICE: Sum: 5 BEGIN sum := a + b; CALL add_proc(2, 3, sum); END; RAISE NOTICE 'Sum: %', sum; $$ END LANGUAGE plpgsql; $$; 44 Stored Functions vs Stored Procedures (in PostgreSQL) Summary Function must return something, but it can be VOID Procedures can return something (using INOUT and OUT parameters) Best practice (most of the time) Return value(s) ➜ CREATE FUNCTION No return value ➜ CREATE PROCEDURE 45 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control Structures Cursors Summary 46 Assignments Example: Get the mark of a student; automatically consider any bonus. CREATE FUNCTION get_mark(sid INT, bonus INT DEFAULT 0) Basic assignment with := RETURNS INTEGER AS age := 29; $$ optional argument DECLARE name := 'Alice'; mark INTEGER; BEGIN -- Get current mark of students SELECT points INTO mark FROM students WHERE id = sid; mark := mark + bonus; -- Add bonus to mark SELECT … INTO … END; RETURN mark; Assignment of query result $$ LANGUAGE plpgsql; to declared variable(s) SELECT get_mark(3, 10); SELECT get_mark(3); get_mark get_mark 75 65 47 Quick Quiz … $$ DECLARE dob DATE; BEGIN How can we set the value dob := … for dob manually? END; $$ LANGUAGE plpgsql; 48 Control Structures Conditionals: 4 types of IF expressions IF … THEN … END IF Note: PostgreSQL also offers IF … THEN … ELSE … END IF ELSEIF as an alias for ELSIF. IF … THEN … ELSIF … THEN … ELSE … END IF 2 types of CASE expressions CASE … WHEN … THEN … ELSE … END CASE CASE WHEN … THEN … ELSE … END CASE Simple Loops LOOP … END LOOP (typically requires EXIT…WHEN… to jump out of loop) WHILE … LOOP … END LOOP FOR … IN … LOOP … END LOOP Relevant PostgreSQL Docs 49 Conditional & Simple Loops — Example Compute the sum of the first n integers; if n is negative, return 0. CREATE FUNCTION sum_n(IN n INT) SELECT sum_n(5); RETURNS INT AS $$ sum_n DECLARE sum INT; 15 BEGIN sum := 0; IF n =0 MOVE: only move to row (no read) Fetch abs(n)-th row from the end, if n < 0. ABSOLUTE 0 positions before the first row RELATIVE n Fetch the n-th succeeding row, if n >= 0 Fetch the abs(n)-th prior row, if n < 0 Possible directions Position before first row or after last row if n is out of range RELATIVE 0 re-fetches the current row, if any FORWARD Fetch the next row (same as NEXT) BACKWARD Fetch the prior row (same as PRIOR). Relevant PostgreSQL Docs 58 Cursors — Example (beyond NEXT) CREATE OR REPLACE FUNCTION median_points() SELECT median_points(); RETURNS NUMERIC AS $$ median_points DECLARE 85 c CURSOR FOR (SELECT * FROM students ORDER BY points DESC); s1 RECORD; s2 RECORD; num_students INT; BEGIN … points … OPEN c; SELECT COUNT(*) INTO num_students FROM students; … 94 … IF num_students%2 = 1 THEN … 94 … FETCH ABSOLUTE (num_students+1)/2 FROM c INTO s1; RETURN s1.points; … 91 … ELSE … 90 … FETCH ABSOLUTE num_students/2 FROM c INTO s1; … 86 … FETCH NEXT FROM c INTO s2; RETURN (s1.points+s2.points)/2; … 84 … END IF; … 82 … CLOSE c; END; … 75 … $$ … 70 … LANGUAGE plpgsql; … 65 … 59 Quick Quiz … ELSE FETCH ABSOLUTE num_students/2 FROM c INTO s1; FETCH NEXT FROM c INTO s2; RETURN (s1.points+s2.points)/2; END IF; … How could we rewrite this line but preserve the overall functionality? 60 Dynamic Cursors — Example CREATE OR REPLACE FUNCTION median_points(IN has_graduated BOOLEAN) RETURNS NUMERIC AS SELECT median_points(TRUE); $$ DECLARE median_points c CURSOR (grad BOOLEAN) FOR (SELECT * FROM students WHERE graduated = grad 88 ORDER BY points DESC); s1 RECORD; s2 RECORD; num_students INT; BEGIN OPEN c(has_graduated); SELECT COUNT(*) INTO num_students FROM students WHERE graduated = has_graduated; IF num_students%2 = 1 THEN SELECT median_points(FALSE); FETCH ABSOLUTE (num_students+1)/2 FROM c INTO s1; RETURN s1.points; median_points ELSE FETCH ABSOLUTE num_students/2 FROM c INTO s1; 78 FETCH NEXT FROM c INTO s2; RETURN (s1.points+s2.points)/2; END IF; CLOSE c; END; $$ LANGUAGE plpgsql; 61 Overview Writing Database Applications Motivation Statement Level Interface Call Level Interface SQL Injection Attacks CS2102 Database System — Lecture 7 SQL Functions and Procedures Motivation & Overview Main Parameters: arguments, language, return values Assignments & Control Structures Cursors Summary 62 Summary Stored functions & procedures Combine SQL statements and procedural logic into a single unit Move (some) application logic into the database Support query results not possible to with "normal" SQL (at least not with the SQL feature set covered in this course) Implementation Support for different programming languages (Python, Perl, Java, JavaScript, Lua, etc.) Focus here: PL/pgSQL (PostgreSQL variation of the SQL/PSM standard) Next lecture: Triggers Automatically execute functions when "interesting events" happen 63 Solutions to Quick Quizzes Slide 30: It is more convenient to return more than one value Slide 31: $1+$2 Slide 34: Not really: Does not handle students with the same max #points Slide 36: We have to return all the columns of the table Slide 39: Works, but only one record/tuple will be returned Slide 41: SETOF records requires at least 2 output parameters Slide 48: d = TO_DATE('2023-10-10','YYYY-MM-DD'); Slide 60: Only 1-line alternatives FETCH c INTO s2; FETCH RELATIVE 1 FROM c INTO s2; FETCH ABSOLUTE num_students/2+1 FROM c INTO s2; 64